-
Notifications
You must be signed in to change notification settings - Fork 1
Refactor iatlas to cbioportal pipeline #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| WORKDIR /root/ | ||
|
|
||
| # clone dep repos | ||
| RUN git clone https://github.com/rxu17/datahub-study-curation-tools.git -b upgrade-to-python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will need to be updated once the PR to this repo: https://github.com/cBioPortal/datahub-study-curation-tools gets merged
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can ask Ramya and Ritika to see if the code can be reviewed, but this should probably be a Dockerfile comment than a PR comment. I'm unsure when that will be reviewed and merged
|
@danlu1 sorry, I would hold off on a review as I discovered something last week/still need to make updates and investigate. Thanks for the review so far! |
|
@danlu1 Feel free to review, looks like there wasn't any issue |
* add anders dataset specific filtering, convert lens map to be string vals * address PR comments
* initial commit for incorporating neoantigen data * rearrange code to have a general validation script * add tests * remove unused code * remove unused code * add unit tests and docstring * update docstring order of ops * add indicator in logs for any error that study failed, address PR comments
|



Problem:
Our current workflow is:
But the pipeline was created so that you could upload mafs, clinical data separately, and you always end up running clinical data through processing before it can validate on all files. This order of ops doesn't make sense for our current intake workflow.
Depends on: #122
Solution:
This is because we always want to validate all of the data as a group and never individually when using the cbioportal validator (and it's the final check that the data is good to go to upload).
This edit also includes setting up the project environment with Docker.
Along with adjustments to allow us to use the
datahub-curation-toolsrepo here. The PR to the curation tools is here: cBioPortal/datahub-study-curation-tools#67Testing: