Refactor iatlas to cbioportal pipeline #119

rxu17 · 2025-08-26T05:29:00Z

Problem:

Our current workflow is:

Intake clinical data from synapse folder and process (we have all clinical data available) then run cbioportal validator
Intake mafs from synapse folder and process (this is staggered based on what is available so we don't have all datasets available) then run cbioportal validator
then run cbioportal validator on neoantigen, gene expression and gene signature data (given by collaborator so this isn't on synapse)

But the pipeline was created so that you could upload mafs, clinical data separately, and you always end up running clinical data through processing before it can validate on all files. This order of ops doesn't make sense for our current intake workflow.

Depends on: #122

Solution:

Have the pipeline process and save maf and clinical files locally.
Have a designated script to run cbioportal validator AND then upload all available to Synapse.

This is because we always want to validate all of the data as a group and never individually when using the cbioportal validator (and it's the final check that the data is good to go to upload).

This edit also includes setting up the project environment with Docker.

Along with adjustments to allow us to use the datahub-curation-tools repo here. The PR to the curation tools is here: cBioPortal/datahub-study-curation-tools#67

Testing:

Tested on regular processing for clinical and maf data and results match

rxu17 · 2025-08-27T06:27:57Z

local/iatlas/cbioportal_export/Dockerfile

+WORKDIR /root/
+
+# clone dep repos
+RUN git clone https://github.com/rxu17/datahub-study-curation-tools.git -b upgrade-to-python3


This will need to be updated once the PR to this repo: https://github.com/cBioPortal/datahub-study-curation-tools gets merged

Maybe we can ask Ramya and Ritika to see if the code can be reviewed, but this should probably be a Dockerfile comment than a PR comment. I'm unsure when that will be reviewed and merged

local/iatlas/README.md

local/iatlas/cbioportal_export/clinical.py

rxu17 · 2025-09-02T20:33:52Z

@danlu1 sorry, I would hold off on a review as I discovered something last week/still need to make updates and investigate. Thanks for the review so far!

rxu17 · 2025-09-18T02:22:27Z

@danlu1 Feel free to review, looks like there wasn't any issue

* add anders dataset specific filtering, convert lens map to be string vals * address PR comments

* initial commit for incorporating neoantigen data * rearrange code to have a general validation script * add tests * remove unused code * remove unused code * add unit tests and docstring * update docstring order of ops * add indicator in logs for any error that study failed, address PR comments

sonarqubecloud · 2025-09-24T20:35:19Z

Quality Gate passed

Issues
180 New issues
0 Accepted issues

Measures
3 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

rxu17 added 4 commits August 25, 2025 22:21

refactor order of ops

e0fb47c

adjust code to remove .metadata

f448fc0

add docker and dep

d643bcb

Merge branch 'main' into refactor_pipeline

a5d26e1

rxu17 marked this pull request as ready for review August 27, 2025 06:26

rxu17 requested a review from a team as a code owner August 27, 2025 06:26

rxu17 commented Aug 27, 2025

View reviewed changes

rxu17 added 2 commits August 28, 2025 15:10

move all and sequenced caselist generation to load.py

f142ece

remove not needed code

f49ae23

danlu1 reviewed Sep 2, 2025

View reviewed changes

local/iatlas/README.md Outdated Show resolved Hide resolved

local/iatlas/README.md Outdated Show resolved Hide resolved

local/iatlas/README.md Outdated Show resolved Hide resolved

local/iatlas/cbioportal_export/clinical.py Show resolved Hide resolved

update to current workflow

f356414

rxu17 requested a review from danlu1 September 18, 2025 02:22

rxu17 added 3 commits September 17, 2025 19:34

add docker setup and tests to README

229914c

[DPE-1453] Process ANDERS clinical dataset (#122)

60bc82a

* add anders dataset specific filtering, convert lens map to be string vals * address PR comments

rxu17 merged commit 7d91b10 into main Sep 29, 2025
3 checks passed

rxu17 deleted the refactor_pipeline branch September 29, 2025 01:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor iatlas to cbioportal pipeline #119

Refactor iatlas to cbioportal pipeline #119

Uh oh!

rxu17 commented Aug 26, 2025 •

edited

Loading

Uh oh!

rxu17 Aug 27, 2025

Uh oh!

thomasyu888 Sep 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rxu17 commented Sep 2, 2025 •

edited

Loading

Uh oh!

rxu17 commented Sep 18, 2025

Uh oh!

sonarqubecloud bot commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Refactor iatlas to cbioportal pipeline #119

Refactor iatlas to cbioportal pipeline #119

Uh oh!

Conversation

rxu17 commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Testing:

Uh oh!

rxu17 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

thomasyu888 Sep 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rxu17 commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rxu17 commented Sep 18, 2025

Uh oh!

sonarqubecloud bot commented Sep 24, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rxu17 commented Aug 26, 2025 •

edited

Loading

rxu17 commented Sep 2, 2025 •

edited

Loading