-
Notifications
You must be signed in to change notification settings - Fork 211
fix: get sra download to work and test it #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: get sra download to work and test it #102
Conversation
…m the rna-seq-kallisto-sleuth workflow
WalkthroughUpdate CI workflows and bump several action versions; add a new SRA test config; bump Ensembl release and enable trimming in main config; change SRA FASTQ read suffixes from Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/main.yml(1 hunks)config/config.yaml(1 hunks)workflow/rules/align.smk(1 hunks)workflow/rules/common.smk(1 hunks)workflow/rules/ref.smk(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/align.smk:0-0
Timestamp: 2025-09-05T10:23:31.972Z
Learning: User dlaehnemann prefers simpler, more readable code solutions over complex ones when the complexity doesn't add functional value, as demonstrated when they chose a simple list over a list comprehension with filtering for the star_align rule's extra parameters.
📚 Learning: 2025-09-05T10:13:14.769Z
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/ref.smk:66-68
Timestamp: 2025-09-05T10:13:14.769Z
Learning: In Snakemake workflows, `lookup()` is a built-in semantic helper function that provides clean access to nested configuration values with default fallbacks. The syntax `lookup(within=config, dpath="params/star/index", default="")` is the recommended approach for accessing nested config parameters safely.
Applied to files:
workflow/rules/ref.smk
🪛 actionlint (1.7.8)
.github/workflows/main.yml
15-15: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
29-29: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
44-44: the runner of "actions/checkout@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🔇 Additional comments (7)
workflow/rules/common.smk (1)
52-61: LGTM! This fix correctly addresses issue #101.The change from
read=["R1", "R2"]toread=["1", "2"]aligns the expected SRA file naming with the actual output produced by the get_sra rule, which generates files with_1.fastqand_2.fastqsuffixes.config/config.yaml (3)
9-38: Excellent documentation improvements for Ensembl configuration.The expanded guidance for species, release, and build parameters with links to Ensembl resources will help users configure the workflow correctly.
45-45: Verify that default trimming activation is intentional.Setting
trimming.activate: Truein the default config means users who copy this configuration will have trimming enabled by default. This is a behavioral change that could affect reproducibility of existing analyses.Is this change intentional for the default configuration? If so, consider documenting this change in the PR description or commit message.
29-29: No changes required.The Snakemake-wrappers ensembl-sequence wrapper accepts release as either an integer or a numeric string, and internally converts it via int(snakemake.params.release). The change from numeric
100to string"115"is compatible. Additionally, the existing test configuration.test/config_sra/config.yamlalready usesrelease: "115"in this exact format, confirming the workflow handles this configuration correctly..github/workflows/main.yml (1)
71-82: LGTM! SRA test addition validates the fix.The new test workflow for SRA file download directly tests the functionality fixed in this PR and ensures the changes in
workflow/rules/common.smkwork as expected.workflow/rules/ref.smk (1)
70-70: Clarify the intentional downgrade from v7.2.0 to v3.3.7.This change downgrades the STAR index wrapper from v7.2.0 (newer, current stable) to v3.3.7 (older). v7.2.0 uses STAR 2.7.11b with updated wrapper code and improvements, while v3.3.7 uses the older STAR ~2.7.11a. Confirm this downgrade is intentional and document the reason (e.g., compatibility with other components, reproducibility needs).
workflow/rules/align.smk (1)
25-25: Verify intention and compatibility of STAR wrapper version downgrade.Web verification confirms both v7.2.0 and v3.3.7 are valid Snakemake Wrappers tags. However, the change from v7.2.0 to v3.3.7 is a downgrade to an older version with confirmed behavioral differences, including different implementation logic, temp-directory handling, optional output handling, and decompression strategies. Additionally, v7.2.0 pulls STAR v2.7.11b via conda, while v3.3.7 may pull a different STAR version.
Please confirm this downgrade is intentional and verify that the behavioral changes between wrapper versions are compatible with your workflow requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/conventional-prs.yaml(1 hunks).github/workflows/main.yml(1 hunks).github/workflows/release-please.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/align.smk:0-0
Timestamp: 2025-09-05T10:23:31.972Z
Learning: User dlaehnemann prefers simpler, more readable code solutions over complex ones when the complexity doesn't add functional value, as demonstrated when they chose a simple list over a list comprehension with filtering for the star_align rule's extra parameters.
🔇 Additional comments (7)
.github/workflows/conventional-prs.yaml (1)
16-16: Action version update is consistent with PR objectives.The bump from
v5tov6of the semantic PR linter aligns with the broader CI action upgrades mentioned in the PR summary. The configuration remains unchanged and compatible.Please confirm that
amannn/action-semantic-pull-request@v6has no breaking changes affecting this workflow's PR title validation logic..github/workflows/main.yml (6)
71-82: SRA test configuration addition aligns with PR objectives.The new SRA test steps (lines 71–82) properly extend the CI matrix to cover SRA downloads, addressing the PR objective to test the fixed SRA functionality. Ensure that
.test/config_sra/config.yamlis included in this PR and properly configured with SRA sample identifiers (e.g., from issue #101) to validate the read naming fix.Confirm that
.test/config_sra/config.yamlexists in the PR and contains appropriate SRA samples for testing.
71-82: SRA test configuration addition aligns with PR objectives.The new SRA test steps properly extend the CI matrix to cover SRA downloads, addressing the PR goal to test the fixed SRA functionality. Ensure
.test/config_sra/config.yamlis included in this PR with appropriate SRA sample identifiers (e.g., ERR5043451, ERR5043769 from issue #101) to validate the read naming fix (1/2 instead of R1/R2).Confirm that
.test/config_sra/config.yamlexists in this PR with proper SRA test samples.
14-25: No action required—v6 is the latest stable version.The upgrade to
actions/checkout@v6is correct and reflects the current best practice. Verification confirms v6 is the latest stable version maintained by GitHub Actions as of November 24, 2025. The code already implements the recommended version.
31-31: snakemake/snakemake-github-action@v2 upgrade verified as current and compatible.The latest stable release is v2.0.1 (released April 7, 2025), and all action parameters used in the workflow (directory, snakefile, args, submodules) are compatible with v2. All referenced config files exist and are properly configured (.test/config_basic/config.yaml, .test/config_complex/config.yaml, .test/config_sra/config.yaml).
31-31: No issues found—all verification checks passed.The latest stable release is v2.0.1 (released Apr 7, 2025), and the code's use of
@v2is appropriate as it pulls the latest v2.x patch. The action's API parameters (directory, snakefile, args) are correctly configured across all workflow steps, and all three referenced config files (.test/config_basic/config.yaml, .test/config_complex/config.yaml, .test/config_sra/config.yaml) exist and are properly specified in the workflow.
14-25: No issues found. The workflow configuration is current and properly structured.The latest stable version of actions/checkout is v6, which is correctly specified in the workflow. Additionally, snakemake-github-action v2.0.1 is the latest stable release, and the SRA test configuration file (.test/config_sra/config.yaml) exists in the repository, confirming the test setup is properly structured to provide CI coverage for SRA downloads as intended.
…, to make report creation work" This reverts commit 9854d59.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
config/config.yaml (1)
9-17: Extensive documentation improvements are valuable but may be scope creep.The added comments throughout the Ensembl reference section are thorough and helpful (detailing species sources, release strategy, and build selection). However, for a focused bug-fix PR addressing SRA filename resolution, these additions seem orthogonal to the core objective.
Consider whether this documentation should be deferred to a separate PR for maintainability clarity or deliberately bundled as part of improving SRA testing setup.
Also applies to: 19-28, 30-37
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
config/config.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/align.smk:0-0
Timestamp: 2025-09-05T10:23:31.972Z
Learning: User dlaehnemann prefers simpler, more readable code solutions over complex ones when the complexity doesn't add functional value, as demonstrated when they chose a simple list over a list comprehension with filtering for the star_align rule's extra parameters.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-workflow
🔇 Additional comments (2)
config/config.yaml (2)
45-45: Enabling trimming aligns with PR objectives.Setting
trimming.activate: Trueis consistent with the PR goal to enable and test the SRA workflow end-to-end, including adapter trimming via fastp.
29-29: Review comment is factually incorrect: species field remainshomo_sapiens, not changed toGallus gallus.However, a real configuration error exists: the
buildfield containsbGalGal1.mat.broiler.GRCg7b(Gallus gallus) whilespeciesis stillhomo_sapiens. This mismatch will cause workflow failures since these are incompatible.Verify whether:
- The
speciesfield should also be changed togallus_gallus(incomplete migration), or- The
buildfield should be reverted toGRCh38(accidental edit), or- This config should be moved to a separate test configuration file with both fields aligned.
The actual SRA filename fix (changing
read=["R1", "R2"]to["1", "2"]) is located inworkflow/rules/common.smkline 95 and appears unrelated to these config changes.Likely an incorrect or invalid review comment.
…ome too big for GitHub Actions based star indexing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/main.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/align.smk:0-0
Timestamp: 2025-09-05T10:23:31.972Z
Learning: User dlaehnemann prefers simpler, more readable code solutions over complex ones when the complexity doesn't add functional value, as demonstrated when they chose a simple list over a list comprehension with filtering for the star_align rule's extra parameters.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-workflow
🔇 Additional comments (2)
.github/workflows/main.yml (2)
62-65: SRA test configuration is properly set up and ready for use.Verification confirms all three concerns:
Config file exists and is properly configured:
.test/config_sra/config.yamlis present and correctly referencesconfig_sra/samples.tsvandconfig_sra/units.tsv.Test data files are in place: Both
samples.tsv(4 samples with treatment metadata) andunits.tsv(4 units with valid SRA accessions: SRR35987766-SRR35987769) are properly structured and ready for the workflow.SRA rule structure matches expectations: The
get_srarule inworkflow/rules/trim.smkoutputs files with the correct_1and_2suffix pattern for paired-end reads via the fasterq-dump wrapper.The
stageincommand correctly exportsTMPDIR=${{ runner.temp }}to handle large SRA file downloads, and the workflow pattern (test execution followed by report generation) is consistent with existing test configurations.
29-35: snakemake-github-action@v2 is current and properly configured.Verification confirms v2.0.1 (released April 7, 2025) is the latest stable version. All three configuration arguments used in the workflow—
directory,snakefile, andargs—are fully supported in v2, as documented in the action's README. The configuration in the workflow is correct and properly formatted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
workflow/rules/trim.smk(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/align.smk:0-0
Timestamp: 2025-09-05T10:23:31.972Z
Learning: User dlaehnemann prefers simpler, more readable code solutions over complex ones when the complexity doesn't add functional value, as demonstrated when they chose a simple list over a list comprehension with filtering for the star_align rule's extra parameters.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: run-workflow
…n report generation, and increase cores to 8 (GitHub Action runners should currently have 4: https://docs.github.com/en/actions/reference/runners/github-hosted-runners#standard-github-hosted-runners-for-public-repositories - snakemake will restrict to what's available)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/main.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/align.smk:0-0
Timestamp: 2025-09-05T10:23:31.972Z
Learning: User dlaehnemann prefers simpler, more readable code solutions over complex ones when the complexity doesn't add functional value, as demonstrated when they chose a simple list over a list comprehension with filtering for the star_align rule's extra parameters.
📚 Learning: 2025-12-04T14:47:42.368Z
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 102
File: .github/workflows/main.yml:14-25
Timestamp: 2025-12-04T14:47:42.368Z
Learning: Always verify that a specific version exists at the exact repository path before suggesting version upgrades. Check both the repository organization/path and the available version tags, as projects may move between organizations (e.g., super-linter moved from github/super-linter to super-linter/super-linter).
Applied to files:
.github/workflows/main.yml
🔇 Additional comments (2)
.github/workflows/main.yml (2)
92-104: SRA test configuration is well-structured.The new SRA test stages are appropriately configured with:
stageinstep to clean.test/results/and setTMPDIRfor adequate disk space during SRA downloads- Proper configuration file reference (
.test/config_sra/config.yaml)- Consistent flag formatting across both test and report steps
14-65: Modern action versions and workflow structure look good.The workflow correctly uses:
actions/checkout@v6(current, updated from v3)super-linter/super-linter@v8at the correct organization pathsnakemake-github-action@v2jlumbroso/[email protected](valid release)Disk space and apt-get setup are appropriate for the Ubuntu runner environment.
…he mamba install failure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/main.yml(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 95
File: workflow/rules/align.smk:0-0
Timestamp: 2025-09-05T10:23:31.972Z
Learning: User dlaehnemann prefers simpler, more readable code solutions over complex ones when the complexity doesn't add functional value, as demonstrated when they chose a simple list over a list comprehension with filtering for the star_align rule's extra parameters.
📚 Learning: 2025-12-04T14:47:42.409Z
Learnt from: dlaehnemann
Repo: snakemake-workflows/rna-seq-star-deseq2 PR: 102
File: .github/workflows/main.yml:14-25
Timestamp: 2025-12-04T14:47:42.409Z
Learning: Always verify that a specific version exists at the exact repository path before suggesting version upgrades. Check both the repository organization/path and the available version tags, as projects may move between organizations (e.g., super-linter moved from github/super-linter to super-linter/super-linter).
Applied to files:
.github/workflows/main.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: test basic model, no batch_effects
- GitHub Check: test sra file download, no batch_effects
- GitHub Check: test multiple variables_of_interest, include batch_effects
🔇 Additional comments (3)
.github/workflows/main.yml (3)
14-25: Workflow modernization looks solid.The formatting job updates checkout to v6 and super-linter to super-linter/super-linter@v8 (after the organization move). Both versions are current and appropriate. Based on learnings, the org/path verification has been correctly applied here.
29-35: Linting job updates are correct.The upgrade to actions/checkout@v6 and snakemake-github-action@v2 is appropriate. The lint args format is valid (corrects the previous malformed flags issue).
68-116: Matrix-driven testing structure is well-designed.The new matrix strategy enables testing three configurations (basic, complex, sra) with dynamic resource allocation. Conditional disk-space and reporting steps are well-implemented. The apt-get update and free-disk-space actions are appropriately placed and conditioned.
This pull request is closes #101 .
In addition to fixing the issue there, it actually tests downloading some SRA data and running the workflow on it. To keep this manageable, I had to abandon the idea of using a
gallus_gallusdataset. Instead, we end up usingsacharomyces_cerevisiaeagain, and using a subset of a dataset.To further increase disk space available in tests that need it, we now:
Finally, this updates all the GitHub Actions used and I had to track down the following bug to get our CIs working again:
conda/constructor#1135
We are currently working around this in the snakemake github action, by pinning to an older
miniforgeversion:conda/constructor#1135
Summary by CodeRabbit
New Features
Chores
Configuration
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.