myeloma_imc

Repo for reproducing the spatial analysis on the myeloma cohort from NTNU (Standal Group) for the manuscript: [TBD]

Raw data

Raw data deposited to Zenodo: 10.5281/zenodo.17093203

Raw data includes:

Images, segmentation masks, bone masks, steinbock raw outputs, patient metadata
Intermediary files for phenotyping (e.g. scimap gates, scimap outputs, manually annotated and aggregated batch-corrected intensities CSVs, XGB models including performances and parameters, holdout test data for xgboost) and other models like cellcharter AutoK and trVAE models
Processed single-cell table as .csv and .h5ad (this can be used to reproduce the Figures from the manuscript)

Processing workflow

Note: All paths need to be adjusted to the github repo and locally stored files

Image processing with Steinbock

Steinbock has been used with standard parameters:

steinbock preprocess imc panel

steinbock preprocess imc images --hpf 50

steinbock segment deepcell --minmax --verbosity INFO

steinbock measure intensities

steinbock measure regionprops

steinbock export anndata --intensities intensities --data regionprops -o cells.h5ad

Further preprocessing

Artifact removal with subsequent cleaning of quantification tables

Bone labeling from geojson using this script and adjusting quantification files accordingly

Phenotyping

For detailed information on the different steps taken see our manuscript at: [TBD]

Ground Truth

Ground truth was generated with visual manual annotation using scimap's prior knowledge approach. Note: Scimap version 2.0.5 was used, many functions for hierarchical prior knowlege driven annotation have been updated in more recent versions.

Single images were annotated. Intermediary output files can be found here with one annotated anndata object per image and the stored gates. Aggregating single outputs into csv can be performed here. Adjusting existing files to a new hierarchical assignment table can be found in the first part of this script

Phenotyping has been performed with scanorama batch corrected (scanorama_corrected) data, however uncorrected data (uncorrected) is also present, therefore data paths in scripts have to be set accordingly. As Scanorama corrected data yielded superior performance in our model, annotations soley rely on scanorama-corrected data.

Batch correction has been perforned with subsequent transfer to csvs. If uncorrected values should be used, these can be found in the data repository (not recommended)
- For adjusting the per-image anndata annotations from scimap with batch corrected values, refer to the second part of this script

Training of XGBoost model

Ground Truth aggregated data was used to train an xgboost model. The script integrates optuna for hyperparameter tuning.
Classifier was applied on all data. Models are uploaded to the data repository (see above)
Running the model on holdout data was performed here. Holdout data annotations can be downloaded form the data repository

Downstream analysis

Preprocessing

Transfering the classifier annotations (csv format) to the quantification tables from Steinbock output and creating an updated anndata object with comprehensive information
Further preprocessing steps as removing/relabeling patients can be found here
Phenotyping refinemed by:
- subclustering unknown cells to find celltypes missed by xgboost
- refinement (last part) of macrophages annotations in tumor aggregates. Find the thresholding data needed for this in the public data repository and here
Spatial neighbor graph and Marker normalization can be found here. The script further includes some renamings for celltypes and neighborhoods to fit to the mansucript

CellCharter

To build the trVAE model and run autoK-Clustering to find the best cluster sizes for cellcharter neighborhoods, refer to this. The cellcharter pipeline to extract neighborhood labels can be found here. The trVAE model is uploaded in the same directory, autoK models are uploaded to the data repository

COZI

Running Cozi to infer NEP scores and saving results can be found here

Preprocessing of clinical data and associated spatial scores

Raw Metadata is available as excel sheet on the data repository, to transform into python format and preprocess, use this script
To get neighborhod enrichment scores per patients and saving them, refer to this script
To connect COZI scores to patients, refer to this script

Downstream analysis of the manuscript

All figures can be recreated directly without running the processing workflowe using the anndata object uploaded to the data repository and the interaction scores saved as csv files here All scripts for creating the figures can be found here

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
GeoJSON files_ bone masks		GeoJSON files_ bone masks
phenotyping		phenotyping
src		src
steinbock		steinbock
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Visual abstract.png		Visual abstract.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

myeloma_imc

Raw data

Processing workflow

Image processing with Steinbock

Further preprocessing

Phenotyping

Ground Truth

Training of XGBoost model

Downstream analysis

Preprocessing

CellCharter

COZI

Preprocessing of clinical data and associated spatial scores

Downstream analysis of the manuscript

About

Uh oh!

Releases

Packages

Languages

License

SchapiroLabor/myeloma_imc

Folders and files

Latest commit

History

Repository files navigation

myeloma_imc

Raw data

Processing workflow

Image processing with Steinbock

Further preprocessing

Phenotyping

Ground Truth

Training of XGBoost model

Downstream analysis

Preprocessing

CellCharter

COZI

Preprocessing of clinical data and associated spatial scores

Downstream analysis of the manuscript

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages