Skip to content

Conversation

@rxu17
Copy link

@rxu17 rxu17 commented Aug 27, 2025

Problem:

For the iAtlas to cBioportal project, we need to use python3 to run our processing pipeline for the clinical and maf datasets. We use some of the scripts here in the datahub-study-curation-tools repo to help with the processing so we are not rewriting code on our end namely:

  • oncotree mapping (to map to CANCER_TYPE and CANCER_TYPE_DESCRIPTION) using our clinical files' ONCOTREE_CODE values
  • add clinical header (this is the required format for cbioportal ingestion for clinical files)
  • generate metadata files (required files for cbioportal ingestion for clinical files)
  • generate caselists (required files for cbioportal ingestion for clinical files)

But these scripts use python 2.

Solution:

Here we add changes to port from python 2 to python 3 to be able to use these scripts in our pipeline.

Main changes are the following:

Testing:

  • Tested on the iatlas data to cbioportal project, and results were successfully ingested into cbioportal and validated

datatypes = []
attribute_types = []
priorities = []
"""
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation/spacing fix

@rxu17 rxu17 marked this pull request as ready for review August 27, 2025 18:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant