Solow-Polasky measure — solow

Calculates the Solow-Polasky measure [1] for a set of points given as columns of a numeric matrix.

solow_polasky(x, d = NULL, theta = 1, ...)

Arguments

x	[`matrix`] Numeric input matrix (e.g., a EA-population) where each column holds one point / individual.
d	[`matrix` \| `NULL`] Optional distance matrix. If `NULL`, the default, the Euclidean distance is calculated automatically via `dist`.
theta	[`numeric(1)`] Single scalar value (see details section). Default is 1.
...	[any] Further argument passed down to `dist` in case `d` is `NULL`.

Value

Solow-Polasky diversity measure (scalar numeric value).

Details

This measure was introduced by Solow and Polasky back in 1994 to measure the amount of diversity between species in biology [1]. Later, Ulrich and Thiele [2] adopted this measures for Evolutionary Diversity Optimization where the goal is to come up with a population $P = \{P_1, \ldots, P_{\mu}\}$ of $\mu$ individuals such that the following holds:

All individuals in $P$ adhere to a minimum quality threhold, i.e., $f(x) \leq v_{\min}$ for some threhold value $v_{\min}$ (we assume the fitness function $f$ w.l.o.g. to be minimized).
The population should be “diverse” with respect to some diversity measure $D$ that maps the population to a single scalar numeric value.

Given $P = \{P_1, \ldots, P_{\mu}\}$ and pairwise distances $d(P_i, P_j)$ $1 \leq, i,j \leq \mu$ let $M$ be a $(\mu \times \mu)$ matrix with $$ M_{ij} = \exp(-\theta \cdot d(P_i, P_j)). $$ Then the Solow-Polasky diversity is defined as $$ D_{SP}(P) = \sum_{1 \leq i,j \leq \mu} M_{ij}^{-1} \in [1, \mu] $$ where matrix $M^{-1}$ is the Moore-Penrose generalized inverse of a matrix $M$. $D_{SP}$ can be interpreted as “as the number of different species in the population” [2]. Note however that the measure calculates a real valued diversty in $[1, \mu]$ and no integer value. Hence, it can be seen as a more fine-grained measure of diversity that captures a distance other than the “binary” distance $d'$ were $d'(P_i, P_j) = 1$ if $P_i \neq P_j$ and $d'(P_i, P_j) = 0$ otherwise.

The runtime is dominated by the inverse matrix calculation and hence is upper bounded by $O(\mu^3)$. In [2], the authors propose an alternative method to update the calculation once the population changes. However, the measure implemented here is meant to be used a-posteriori to assess the “diversity” of a final population.

References

[1] Andrew R. Solow and Stephen Polasky. Measuring biological diversity. English (US). In: Environmental and Ecological Statistics 1.2 (June 1994), pp. 95–103. issn: 1352-8505. doi: 10.1007/BF02426650.

[2] Tamara Ulrich and Lothar Thiele. Maximizing population diversity in single-objective optimization. In: 13th Annual Genetic and Evolutionary Computation Conference (GECCO 2011). ACM, 2011, pp. 641–648. doi: 10.1145/2001576.2001665.

Examples

# Generate a random point cloud in [0,1] x [0,1]
x = matrix(runif(100), nrow = 2L)
solow_polasky(x)
#> [1] 1.912474
solow_polasky(x, theta = 2)
#> [1] 3.049745

# All points equal
x = matrix(rep(1, 100), nrow = 2L)
solow_polasky(x)
#> [1] 1