Upgrade scripts used for clinical data ingestion to python3 #67

rxu17 · 2025-08-27T06:50:43Z

Problem:

For the iAtlas to cBioportal project, we need to use python3 to run our processing pipeline for the clinical and maf datasets. We use some of the scripts here in the datahub-study-curation-tools repo to help with the processing so we are not rewriting code on our end namely:

oncotree mapping (to map to CANCER_TYPE and CANCER_TYPE_DESCRIPTION) using our clinical files' ONCOTREE_CODE values
add clinical header (this is the required format for cbioportal ingestion for clinical files)
generate metadata files (required files for cbioportal ingestion for clinical files)
generate caselists (required files for cbioportal ingestion for clinical files)

But these scripts use python 2.

Solution:

Here we add changes to port from python 2 to python 3 to be able to use these scripts in our pipeline.

Main changes are the following:

Updated to print() statement syntax in python3
deprecation of U mode in open(), it's now the default behavior
Updated to use urlib library in python 3, see https://docs.python.org/2/library/urllib2.html and https://docs.python.org/3/library/urllib.request.html#module-urllib.request
Deprecation of the built-insunicode function, Now in python3 by default all strings are unicode
Fixed some mixed indentation in the code

Testing:

Tested on the iatlas data to cbioportal project, and results were successfully ingested into cbioportal and validated

rxu17 · 2025-08-27T18:47:42Z

add-clinical-header/insert_clinical_metadata.py

-	datatypes = []
-	attribute_types = []
-	priorities = []
+    """ 


indentation/spacing fix

rxu17 added 10 commits August 26, 2025 12:28

lint and upgrade to python3 syntax

0bce95c

Merge branch 'cBioPortal:master' into upgrade-to-python3

859dbe3

remove non-essential python3 syntax

63f3fb8

remove non-essential python3 syntax

0ddefb6

remove non-essential python3 syntax

d7517c5

remove non-essential python3 syntax

2dcd428

remove deprecated

5acfeef

remove deprecated

01e1bfb

undo change of clinical_attributes_metadata.txt

538f5e8

remove encoding

993c493

rxu17 mentioned this pull request Aug 27, 2025

Refactor iatlas to cbioportal pipeline Sage-Bionetworks-Workflows/orca-recipes#119

Merged

rxu17 commented Aug 27, 2025

View reviewed changes

add-clinical-header/insert_clinical_metadata.py

datatypes = []

attribute_types = []

priorities = []

"""

Copy link

Author

rxu17 Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation/spacing fix

rxu17 marked this pull request as ready for review August 27, 2025 18:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Upgrade scripts used for clinical data ingestion to python3 #67

Upgrade scripts used for clinical data ingestion to python3 #67

Uh oh!

rxu17 commented Aug 27, 2025 •

edited

Loading

Uh oh!

rxu17 Aug 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Upgrade scripts used for clinical data ingestion to python3 #67

Are you sure you want to change the base?

Upgrade scripts used for clinical data ingestion to python3 #67

Uh oh!

Conversation

rxu17 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem:

Solution:

Testing:

Uh oh!

rxu17 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rxu17 commented Aug 27, 2025 •

edited

Loading