Skip to content

Sort layers alphabetically for deterministic h5ad output#12

Merged
ygao61 merged 1 commit intoCOMBINE-lab:mainfrom
an-altosian:main
Feb 19, 2026
Merged

Sort layers alphabetically for deterministic h5ad output#12
ygao61 merged 1 commit intoCOMBINE-lab:mainfrom
an-altosian:main

Conversation

@an-altosian
Copy link
Copy Markdown
Contributor

Summary

  • Sort AnnData layers alphabetically (via OrderedDict) before writing h5ad files to ensure byte-level reproducibility
  • The HDF5 B-tree stores group children in insertion order, and Python set iteration order (used when populating layers from output_assays) is non-deterministic. This caused the layers (spliced, unspliced, ambiguous) to be written in different orders across runs, producing different md5 checksums even though the data was identical.
  • Applied to both the full quants.h5ad and the filtered filtered_quants.h5ad outputs

Test plan

  • Ran qcatch twice on the same input (1k_pbmc_v3 test data) and verified both quants.h5ad and filtered_quants.h5ad now produce identical md5 checksums across runs
  • Confirmed CSV and HTML outputs are unaffected by the change

🤖 Generated with Claude Code

The HDF5 B-tree stores group children in insertion order, and Python
set iteration order (used when populating layers from output_assays)
is non-deterministic. This caused the layers (spliced, unspliced,
ambiguous) to be written in different orders across runs, producing
different md5 checksums even though the data was identical.

Sort layers alphabetically via OrderedDict before writing h5ad files
to ensure byte-level reproducibility.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@ygao61 ygao61 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@ygao61 ygao61 merged commit 7eb51f4 into COMBINE-lab:main Feb 19, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants