Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 19 additions & 14 deletions python/tskit/trees.py
Original file line number Diff line number Diff line change
Expand Up @@ -9299,23 +9299,28 @@ def pca(
eigenvectors of the genetic relatedness matrix, which are obtained by a
randomized singular value decomposition (rSVD) algorithm.

Concretely, if :math:`M` is the matrix of genetic relatedness values, with
:math:`M_{ij}` the output of
:meth:`genetic_relatedness <.TreeSequence.genetic_relatedness>`
between sample :math:`i` and sample :math:`j`, then by default this returns
the top ``num_components`` eigenvectors of :math:`M`, so that
Concretely, take :math:`M` as the matrix of non-span-normalised
branch-based genetic relatedness values, for instance obtained by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
branch-based genetic relatedness values, for instance obtained by
genetic relatedness values, for instance obtained by

setting :math:`M_{ij}` to be the :meth:`~.TreeSequence.genetic_relatedness`
between sample :math:`i` and sample :math:`j` with ``mode="branch"``,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
between sample :math:`i` and sample :math:`j` with ``mode="branch"``,
between sample :math:`i` and sample :math:`j` with the specified ``mode``,

``proportion=False`` and ``span_normalise=False``. Then by default this
returns the top ``num_components`` eigenvectors of :math:`M`, so that
``output.factors[i,k]`` is the position of sample `i` on the `k` th PC.
If ``samples`` or ``individuals`` are provided, then this does the same thing,
except with :math:`M_{ij}` either the relatedness between ``samples[i]``
and ``samples[j]`` or the nodes of ``individuals[i]`` and ``individuals[j]``,
respectively.
If ``samples`` or ``individuals`` are provided, then this does the same
thing, except with :math:`M_{ij}` either the relatedness between
``samples[i]`` and ``samples[j]`` or the average relatedness between the
nodes of ``individuals[i]`` and ``individuals[j]``, respectively.
Factors are normalized to have L2 norm 1, i.e.,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Factors are normalized to have L2 norm 1, i.e.,
Factors are normalized to have norm 1, i.e.,

``output.factors[:,k] ** 2).sum() == 1)`` for any ``k``.

The parameters ``centre`` and ``mode`` are passed to
:meth:`genetic_relatedness <.TreeSequence.genetic_relatedness>`;
if ``windows`` are provided then PCA is carried out separately in each window.
If ``time_windows`` is provided, then genetic relatedness is measured using only
ancestral material within the given time window (see
:meth:`decapitate <.TreeSequence.decapitate>` for how this is defined).
:meth:`~.TreeSequence.genetic_relatedness`: the default ``centre=True`` results
in factors whose elements sum to zero; ``mode`` currently only supports the
``"branch"`` setting. If ``windows`` are provided then PCA is carried out
separately in each genomic window. If ``time_windows`` is provided, then genetic
relatedness is measured using only ancestral material within the given time
window (see :meth:`decapitate <.TreeSequence.decapitate>` for how this is
defined).

So that the method scales to large tree sequences, the underlying method
relies on a randomized SVD algorithm, using
Expand Down