@@ -280,12 +280,10 @@ and also inefficient, because the allele frequencies for one population
280
280
gets used in computing many of those values.
281
281
So, statistics that take a ` sample_sets ` argument also take an ` indexes ` argument,
282
282
which for a statistic that operates on ` k ` sample sets will be a list of ` k ` -tuples.
283
- If ` indexes ` is a length ` n ` list of ` k ` -tuples,
284
- then the output will have ` n ` columns,
285
- and if ` indexes[j] ` is a tuple ` (i0, ..., ik) ` ,
286
- then the ` j ` -th column will contain values of the statistic computed on
287
- ` (sample_sets[i0], sample_sets[i1], ..., sample_sets[ik]) ` .
288
-
283
+ If ` indexes ` is a length ` n ` list of ` k ` -tuples, then the output will have ` n `
284
+ columns, so if ` indexes[j] ` is the tuple ` (i0, ..., ik) `
285
+ the ` j ` -th column will contain values of the statistic computed on
286
+ ` (sample_sets[i0], sample_sets[i1], ..., sample_sets[ik]) ` .
289
287
290
288
How multiple statistics are handled differs slightly between statistics
291
289
that operate on single sample sets and multiple sample sets.
@@ -338,14 +336,28 @@ The `indexes` parameter is interpreted in the following way:
338
336
statistic selecting the specified sample sets and we remove the last dimension
339
337
in the result array as described in the {ref}` sec_stats_output_dimensions ` section.
340
338
341
- - If if is ` None ` and ` sample_sets ` contains exactly ` k ` sample sets,
339
+ - If it is ` None ` and ` sample_sets ` contains exactly ` k ` sample sets,
342
340
this is equivalent to ` indexes=range(k) ` . ** Note
343
341
that we also drop the outer dimension of the result array in this case** .
344
342
345
- - If is is a list of ` k ` -tuples (each consisting of integers
343
+ - If it is a list of ` k ` -tuples (each consisting of integers
346
344
of integers between ` 0 ` and ` len(sample_sets) - 1 ` ) of length ` n ` we
347
345
compute ` n ` statistics based on these selections of sample sets.
348
346
347
+ For instance, a common requirement in the case of a pairwise (i.e. ` k=2 ` ) symmetrical
348
+ statistic such as ` divergence ` is to obtain the statistic between all pairs of
349
+ sample sets. This can be done by indexing all possible pairs as follows:
350
+
351
+ ``` {code-cell} ipython3
352
+ # Compute all pairwise divergences between diploid individuals. Or e.g. for pairwise
353
+ # divergences between each sampled genome, use `sample_sets=[[s] for s in ts.samples()]`
354
+ sample_sets=[i.nodes for i in ts.individuals()]
355
+ index_all_pairs = np.triu_indices(len(sample_sets)) # ((0,1), (0,2), ...
356
+ d = ts.divergence(sample_sets, indexes=tuple(zip(*index_all_pairs)))
357
+ pairwise_matrix = np.zeros((len(sample_sets), len(sample_sets)))
358
+ pairwise_matrix[index_all_pairs] = d
359
+ print(pairwise_matrix) # NB: only upper triangular values filled
360
+ ```
349
361
350
362
(sec_stats_windows)=
351
363
0 commit comments