Skip to content

Implementation of sc.tl.score_genes doesn't match reference #3845

@MLubetzki

Description

@MLubetzki

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

Hi everyone!
I had a look at the implementation of sc.tl.score_genes and compared it to what the reference (see #2609 ) is doing and what the corresponding code in Seurat is doing here. My impression is that Seurat is consistent with the paper, but we are deviating from that implementation. I'll detail this below.

This is my understanding of how Seurat/the reference suggest it:
If we want to calculate a score for a set of genes, we have to supply a list of those genes and, in addition, a list of 'reference genes'. The purpose of the reference genes is to "decrease the effect that the quality and complexity of each cell’s data might have" (see Tirosh 2016). In practice,I'd say users don't explicitly specify reference genes, but instead use the default behavior of letting Seurat select a random subset.
Now, it might be the case that our gene list is mostly a set of lowly expressed transcription factors. If we were to randomly select a reference gene set out of all genes, they probably have larger average expression values compared to our transcription factors, which in turn would make the score negative even for cells that have (comparably) high reads in them. To combat this, we don't select the reference genes randomly, but instead bin genes (in n_bins) by their average expression values. Then, for each gene in our genes of interest, we pick reference genes from the same bin to avoid this effect and make the score values center around zero. The way one can interpret this score is: "When comparing our gene list with random reference genes (with similar expression values) - are our genes upregulated (positive score) or downregulated (negative score)."

In principle, scanpy does the same thing. However, here, we are basically only adding one random selection per bin that occurs in our gene list, instead of one random selection per gene of interest. I would not have expected the np.unique() in the linked line.
Why can this be an issue? Suppose gene_list contains 10 lowly expressed genes, and 2 highly expressed genes (that fall in different buckets). This will add 50 lowly expressed genes and 100 highly expressed genes to the reference set, when we'd actually need more like (10 vs 2 (*50)) lowly vs highly expressed genes. This results in scores being biased towards negative values. And vice-versa if we switch the compositions.
I've attached a reproducer for this behavior below.

Minimal code sample

import scanpy as sc
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Create genes with dfferent expression levels
bins = np.repeat(np.linspace(1,100,25), 400)
mat = np.random.choice([1,0], size=(20000,10000))

adata = sc.AnnData(mat * bins)


# Lowly expressed genes
genes_of_interest = ['0', '1', '2', '3', '4', '5']
# Two genes with high expression from different buckets
genes_of_interest += ['9999', '8001']

sc.tl.score_genes(adata, genes_of_interest)

sns.histplot(adata.obs['score'])
plt.show()

# Score is almost completely negative, even though we'd expect it to be approximately normal around zero

Error output

Versions

scipy	1.16.2
numpy	2.3.4
scanpy	0.1.0.dev3628+gcfab46942.d20251021 (0.1.0.dev3629+g095941443.d20251022)
seaborn	0.13.2
anndata	0.12.3
matplotlib	3.10.7
pandas	2.3.3
----	----
executing	2.2.1
patsy	1.0.2
pillow	12.0.0
prompt_toolkit	3.0.52
PyYAML	6.0.3
scikit-learn	1.7.2
sparse	0.17.0
tblib	3.2.0
llvmlite	0.45.1
toolz	1.1.0
pure_eval	0.2.3
Deprecated	1.2.18
cloudpickle	3.1.1
pluggy	1.6.0
joblib	1.5.2
dask	2025.10.0
python-dateutil	2.9.0.post0
MarkupSafe	3.0.3
stack-data	0.6.3
asttokens	3.0.0
traitlets	5.14.3
zappy	0.2.0
wcwidth	0.2.14
psutil	7.1.1
legacy-api-wrap	1.4.1
cycler	0.12.1
jaraco.functools	4.0.1
packaging	25.0
jaraco.collections	5.1.0
six	1.17.0
Pygments	2.19.2
kiwisolver	1.4.9
wrapt	1.17.3
session-info2	0.2.3
decorator	5.2.1
asciitree	0.3.3
setuptools-scm	9.2.2
coverage	7.11.0
numcodecs	0.15.1
natsort	8.4.0
threadpoolctl	3.6.0
setuptools	80.9.0
h5py	3.15.1
statsmodels	0.14.5
jaraco.context	5.3.0
jedi	0.19.2
msgpack	1.1.2
parso	0.8.5
pathspec	0.12.1
typing_extensions	4.15.0
pytz	2025.2
hatchling	1.27.0
numba	0.62.1
zarr	2.18.7
jaraco.text	3.12.1
pyparsing	3.2.5
Jinja2	3.1.6
ipython	9.6.0
hatch-vcs	0.5.0
pyarrow	21.0.0
fast-array-utils	1.2.5
more-itertools	10.3.0
----	----
Python	3.12.7 | packaged by Anaconda, Inc. | (main, Oct  4 2024, 08:22:19) [Clang 14.0.6 ]
OS	macOS-15.7.1-arm64-arm-64bit
CPU	8 logical CPU cores, arm
GPU	No GPU found
Updated	2025-10-22 16:13

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions