Recommended order for sc.pp.filter_genes and sc.pp.normalize_total in preprocessing pipeline

Description
In Scanpy preprocessing workflows, there seems to be some ambiguity regarding the order of applying sc.pp.filter_genes (basic gene filtering based on min_cells/min_counts) and sc.pp.normalize_total (library size normalization).
If filter_genes is applied before normalize_total:

Removing lowly expressed genes changes the total counts per cell slightly.
This alters the normalization factors computed in normalize_total, which in turn affects the relative expression ratios of all remaining genes in each cell.
While the effect is often minor (especially if filtered genes have very low counts), it can propagate to downstream steps like HVG selection, scaling, PCA, clustering, and differential expression.

If filter_genes is applied after normalization, the normalization is performed on the full (unfiltered) gene set, preserving the original relative proportions more accurately.
Question
Is there a clearly recommended or standard order for these steps in the Scanpy preprocessing pipeline?
From reviewing the official tutorials and documentation:

Basic filtering (filter_cells and filter_genes) is typically done early, right after QC and before normalization.
Normalization follows filtering.
Highly variable gene (HVG) selection and further filtering (e.g., subsetting to HVGs) happens after normalization and log-transform.

However, given the subtle impact on per-cell proportions described above, it would be helpful to have explicit guidance or best-practice recommendation on whether basic filter_genes should strictly precede or could follow normalize_total.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommended order for sc.pp.filter_genes and sc.pp.normalize_total in preprocessing pipeline #3925

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recommended order for sc.pp.filter_genes and sc.pp.normalize_total in preprocessing pipeline #3925

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions