Classification of organotropic metastases

Metastatic Tropism Overview

This repository is the code base for the classification of organotropic metastases. Transcriptomic profiles of 7,011 cancer patients in the TCGA database were used to classify and analyze the seeding location of primary tumors. The sequencing data and all clinicopatholigic reports for all of these patients were publicly available for bulk data mining through TCGA Biolinks.

Installation

We utilized multiple programming languages (i.e. Java, Python, and R) to construct learning models and to perform biological analyses. As a result, this created many dependencies so we have provided two different ways to install our framework -- a manual installation and a docker installation.

Docker Installation

The docker image for this project can be pulled from the online Docker Hub repository or can be built using the Dockerfile included in the base directory of this project.

To pull the image from the Docker Hub repo, run the following command:

docker pull mskaro1/mot

To build the image using the Dockerfile, run the following command in the base directory of this project:

docker build --tag mskaro1/mot .

Manual Installation

For those seeking to manually install the project, all of the following dependencies must be satisfied prior to attempting the installation:

Python

Version: Python >= 3.5
Packages: All required Python packages are listed in the requirements.txt.

Java

Version: Java >= 8
Packages:
- Weka >= 3.8.3 (Note: The jar file is already included here)

R

Version: >= 4.0
Packages: All session.info() R packages are listed at the bottom of Enriched_features_Fisher_weighted_simulation.R .

Once all of the required depenedencies are satisfied, run the following command in the base directory to install the project as a python package:

pip install .

Metastatic Classification Demo:

We have provided a sample dataset of TCGA data to demonstrate the effectiveness of our metastatic classification approach. Our sample dataset is of Colon Adenocarcinoma (COAD) tumors that metastasiszed to the colon, liver, or lung.

Recommneded: Docker Approach

docker run --rm -it -v <output-directory>:/demo-outputs mskaro1/mot

Note: <output-directory> should be replaced with the path of a directory on the user's local machine, and it is where the outputs of the demo will be stored.

Manual Approach

 python3 -m mot.metastasis_pipeline -i ./samples/metastasis-demo/ -o <output-directory> -w ./lib/weka.jar -c ./classes -j /src/GainRatio.java

Note: This command should be run in the base directory of the project, and <output-directory> should be replaced with the path to a directory for the outputs to be stored.

Demo Outputs

├── <output-directory>/
│   ├── binary-datasets/
│   ├── oversampled-datasets/
│   ├── important-features/
|   ├── feature-selected-datasets/
|   ├── classification-results/

binary-datasets: The multilabel COAD dataset is split into multiple binary datasets, and the binary datasets are stored in this directory.
oversampled-datasets: The training and testing data generated from the binary datasets. The training data uses synthetic data generated by the SMOTE algorithm, while the testing data uses only real TCGA data.
important-features: The top 1000 features (i.e. genes) of each training dataset ranked by their information gain ratio score.
feature-selected-datasets: The training and testing datasets that only contain the top 1000 selected features.
classification-results: Directory contains the classification results of our Random Forest model on the feature-selected datasets.

General Usage:

The entire metastatic pipeline can be ran using the metastasis_pipeline script. This script is callable from the command line interface using the following command:

python -m mot.metastasis_pipeline

The -h flag to understand all available options.

Additionally, each component of the pipeline can be called individually from the command line. For more information read our wiki for a breakdown of each script's role in the pipeleine.

Note: For those seeking to use the docker image to interact with our framework, run the following command to gain access to the shell of the docker image:

docker run --rm -it --entrypoint="" mskaro1/mot bash

Reviewing:

Docker image available now ! Check out our wiki for implementation actions! Thanks!

Using our code or our model? Consider citing us

Name		Name	Last commit message	Last commit date
Latest commit History 361 Commits
Feature_Selection		Feature_Selection
Feature_recapture		Feature_recapture
bio-analysis/data-download		bio-analysis/data-download
images		images
lib		lib
samples/metastasis-demo		samples/metastasis-demo
src		src
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification of organotropic metastases

Metastatic Tropism Overview

Installation

Metastatic Classification Demo:

General Usage:

Reviewing:

About

Releases

Packages

Contributors 2

Languages

License

michaelSkaro/Classification_of_organotropic_metastases

Folders and files

Latest commit

History

Repository files navigation

Classification of organotropic metastases

Metastatic Tropism Overview

Installation

Metastatic Classification Demo:

General Usage:

Reviewing:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages