Skip to content
/ UTMOST Public
forked from Joker-Jerome/UTMOST

UTMOST (unified test for molecular signatures) is a method for cross-tissue gene expression imputation for transcriptome-wide association analyses.

Notifications You must be signed in to change notification settings

YCSGP/UTMOST

This branch is 2 commits ahead of Joker-Jerome/UTMOST:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ea8fe69 · Mar 2, 2025
Mar 19, 2018
Aug 4, 2019
Jan 19, 2018
Dec 28, 2017
Mar 21, 2018
Mar 21, 2018
Dec 28, 2017
Mar 21, 2018
May 7, 2018
Mar 1, 2025
Dec 28, 2017
Aug 6, 2019
Aug 3, 2019
Dec 31, 2017
Feb 26, 2019
Jul 15, 2019
Apr 25, 2018
Jul 15, 2019
Jan 19, 2018
May 7, 2018

Repository files navigation

UTMOST

UTMOST (Unified Test for MOlecular SignaTures) is a principled method to perform cross-tissue expression imputation and gene-level association analysis. The preprint could be found at A statistical framework for cross-tissue transcriptome-wide association analysis.

Update History

2025/03/01: For weights trained with GTEx v8, they are available here.**

2023/06/09: For weights trained with GTEx v8, please visit https://zhaocenter.org/UTMOST.

2018/04/24: Pre-calculated covariance matrices for single-tissue and joint tests are downloadable now; updated pipeline for single-tissue/joint tests using 44 GTEx tissues + STARNET liver + BLUEPRINT 3 cell types (eQTL/sQTL).

Prerequisites

The software is developed and tested in Linux and Mac OS environments.

  • Python 2.7

  • numpy (>=1.11.1)

  • scipy (>=0.18.1)

  • pandas (>=0.18.1)

  • rpy2 (==2.8.6)

  • R is needed for GBJ testing.

  • GBJ (0.5.0)

## Install python module with pip
$ pip install numpy --user
$ pip install scipy --user
$ pip install pandas --user
$ pip install -Iv rpy2==2.8.6 --user

## GBJ could be installed with R interface
install.packages('GBJ')

Project Layout

  • single_tissue_covariance.py

  • single_tissue_association_test.py

  • joint_covariance.py

  • joint_GBJ_test.py

  • test_tool

  • metax module

The following example assumes that you have python 2.7, numpy, pandas, scipy, rpy2, R and GBJ installed. All of these functions take different number of command line parameters. Run them with --help or -h option to see the options. Codes for training cross-tissue gene-expression imputation models were curated in a separate repo.

Quick start

This section is a demonstration of applying UTMOST with imputation models jointly trained in 44 tissues with GTEx data. The sample_data.zip contains pre-calculated imputation models, covariance matrices for single tissue and joint tissue GBJ test. Pipeline for generating covariance matrices with your own imputation models and incorporating other eQTL/sQTL data (e.g. from STARNET and BLUEPRINT (ftp://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/qtl_as/)), i.e. the analysis pipeline used in manuscript) could be found in the following section.

1. Clone the UTMOST repository

$ git clone https://github.com/Joker-Jerome/UTMOST

2. Go to the software directory

$ cd ./UTMOST

3.1 Download imputation model (weights) data (1.9GB for zipped file, 3.4GB after unzipping)

$ wget --load-cookies /tmp/cookies.txt "https://drive.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies  /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://drive.google.com/uc?export=download&id=1u8CRwb6rZ-gSPl89qm3tKpJArUT8XrEe' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1u8CRwb6rZ-gSPl89qm3tKpJArUT8XrEe" -O sample_data.zip && rm -rf /tmp/cookies.txt
$ unzip sample_data.zip

This folder will include the following files/folders:

weight_db_GTEx/ ## jointly trained imputation models for 44 GTEx tissues 
weight_db_external/ ## imputation models for STARNET liver tissue and BLUEPRINT 3 cell-type eQTL/sQTL data
dosage/ ## a reference genotype panel for calculating covariance matrices
GWAS/ ## a simulated GWAS summary stats file as an example
covariance.txt.gz and DGN-WB_0.5.db ## toy example for demonstrating single-tissue test

To run single-tissue and joint GBJ test with these imputation models, you need to either generate covariance matrices with a reference genotype panel (for details see Methods section in manuscript) or you could download the pre-calculated covariance matrices for 44 GTEx tissues. Instructions on how to calculate covariance matrices could be found in Section 5 in this tutorial.

3.2 Download pre-calculate covariance matrices for single-tissue/joint test (large file 28GB for zipped file, 45GB after unzipping)

$ cd sample_data
$ wget --load-cookies /tmp/cookies.txt "https://drive.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies  /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://drive.google.com/uc?export=download&id=1Kh3lHyTioKIXqCsREmsAyC-dS49KVO9G' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1Kh3lHyTioKIXqCsREmsAyC-dS49KVO9G" -O covariance_tissue.tar.gz && rm -rf /tmp/cookies.txt
$ wget --load-cookies /tmp/cookies.txt "https://drive.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies  /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://drive.google.com/uc?export=download&id=1tqIW5Ms8p1StX7WXXWVa4TGKb5q58TPA' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1tqIW5Ms8p1StX7WXXWVa4TGKb5q58TPA" -O covariance_joint.zip && rm -rf /tmp/cookies.txt
$ tar -zxvf covariance_tissue.tar.gz
$ unzip covariance_joint.zip

covariance_tissue/ and covariance_joint/ contain covariance matrices required for single-tissue and joint gene-trait association tests, respectively.

3.3 Download example GWAS summary statistics GIANT GWAS Anthropometric 2015 BMI data

cd GWAS
wget https://portals.broadinstitute.org/collaboration/giant/images/1/15/SNP_gwas_mc_merge_nogc.tbl.uniq.gz
gunzip SNP_gwas_mc_merge_nogc.tbl.uniq.gz

4. Run UTMOST with cross-tissue imputation models trained in 44 GTEx tissues

4.1. Run single tissue association test for 44 tissues

cd ../.. ## at UTMOST/
mkdir sample_data/results
TISSUE_GTEx=(Adipose_Subcutaneous Adipose_Visceral_Omentum Adrenal_Gland Artery_Aorta Artery_Coronary Artery_Tibial Brain_Anterior_cingulate_cortex_BA24 Brain_Caudate_basal_ganglia Brain_Cerebellar_Hemisphere Brain_Cerebellum Brain_Cortex Brain_Frontal_Cortex_BA9 Brain_Hippocampus Brain_Hypothalamus Brain_Nucleus_accumbens_basal_ganglia Brain_Putamen_basal_ganglia Breast_Mammary_Tissue Cells_EBV-transformed_lymphocytes Cells_Transformed_fibroblasts Colon_Sigmoid Colon_Transverse Esophagus_Gastroesophageal_Junction Esophagus_Mucosa Esophagus_Muscularis Heart_Atrial_Appendage Heart_Left_Ventricle Liver Lung Muscle_Skeletal Nerve_Tibial Ovary Pancreas Pituitary Prostate Skin_Not_Sun_Exposed_Suprapubic Skin_Sun_Exposed_Lower_leg Small_Intestine_Terminal_Ileum Spleen Stomach Testis Thyroid Uterus Vagina Whole_Blood)
for tissue in ${TISSUE_GTEx[@]}
do
python2 ./single_tissue_association_test.py \
--model_db_path sample_data/weight_db_GTEx/${tissue}.db \
--covariance sample_data/covariance_tissue/${tissue}.txt.gz \
--gwas_folder sample_data/GWAS \
--gwas_file_pattern SNP_gwas_mc_merge_nogc.tbl.uniq \
--snp_column SNP \
--effect_allele_column A1 \
--non_effect_allele_column A2 \
--beta_column b \
--pvalue_column p \
--output_file sample_data/results/${tissue}.csv
done

The example command parameters:

  • --model_db_path

    Path to gene expression imputation model (estimated weights/effect sizes of cis-eQTLs).

  • --covariance

    Path to file containing covariance information (used to estimate the variance of gene-level effect size estimator, see Gene-level association test in Methods section of manuscript for details).

  • --gwas_folder

    Folder containing GWAS summary statistics data.

  • --gwas_file_pattern

    The file patten of gwas file (file name of summary statistics if not segmented by chromosomes).

  • --snp_column

    Argument with the name of the column containing the RSIDs.

  • --effect_allele_column

    Argument with the name of the column containing the effect allele.

  • --non_effect_allele_column

    Argument with the name of the column containing the non-effect allele.

  • --beta_column

    The column containing -effect size estimator for each SNP- in the input GWAS files.

  • --pvalue_column

    The column containing -PValue for each SNP- in the input GWAS files.

  • --output_file

    Path where results will be saved to.

4.2. Combine gene-trait associations in 44 tissues by joint GBJ test

mkdir sample_data/results_GTEx ## save association results for cross-tissue joint test
UTMOST_path=/absolute/path/to/UTMOST/
$ python2 joint_GBJ_test.py \
--weight_db $UTMOST_path/sample_data/weight_db_GTEx/ \
--output_dir $UTMOST_path/sample_data/results_GTEx/ \
--cov_dir $UTMOST_path/sample_data/covariance_joint/ \
--input_folder $UTMOST_path/sample_data/results/ \
--gene_info $UTMOST_path/intermediate/gene_info.txt \
--output_name GIANT_BMI_2015_GTEx_44_joint \
--start_gene_index 1 \
--end_gene_index 17290

The example command parameters:

  • --verbosity

    Log verbosity level. 1 means everything will be logged. 10 means high level messages will be logged.

  • --weight_db

    Name of weight db in data folder (imputation models).

  • --input_folder

    Name of folder containing single-tissue association results (generated in Section 4.1).

  • --cov_dir

    Path where covariance results are (covariance matrix for gene-level test statistics across tissues, see Gene-level association test in Methods section of manuscript for details).

  • --output_dir

    Path where results will be saved to.

  • --gene_info

    File containing the all the genes tested.

  • --start_gene_index

    Index of the starting gene in intermediate/gene_info.txt (for parallel computing purpose, could test multiple gene at the same time to reduce computation time).

  • --end_gene_index

    Index of the ending gene in intermediate/gene_info.txt (for parallel computing purpose, could test multiple gene at the same time to reduce computation time).

Output format:

Gene Test score P value
Gene A test score A P value A
Gene B test score B P value B

Incorporating external eQTL/sQTL datasets

Using STARNET and BLUEPRINT (ftp://ftp.ebi.ac.uk/pub/databases/blueprint/blueprint_Epivar/qtl_as/) as an example, for details, please see Results and Methods sections of manuscript

Note: this part also requires data in sample_data folder

5.1. Calculate the single tissue covariance

TISSUE_external=(Liver_STARNET1 mono_eqtl mono_sqtl neut_eqtl neut_sqtl tcel_eqtl tcel_sqtl)
mkdir sample_data/covariance_external
for tissue in ${TISSUE_external[@]}
do
python2 ./single_tissue_covariance.py \
--weight_db sample_data/weight_db_external/${tissue}.db \
--input_folder sample_data/dosage/ \
--covariance_output sample_data/covariance_external/${tissue}.txt.gz
done

The example command parameters:

  • --weight_db

    Path to tissue transriptome model.

  • --input_folder

    Folder containing GWAS summary statistics data.

  • --covariance_output

    Path where covariance will be saved to.

5.2. Run the single tissue association test

for tissue in ${TISSUE_external[@]}
do
python2 ./single_tissue_association_test.py \
--model_db_path sample_data/weight_db_external/${tissue}.db \
--covariance sample_data/covariance_external/${tissue}.txt.gz \
--gwas_folder sample_data/GWAS \
--gwas_file_pattern SNP_gwas_mc_merge_nogc.tbl.uniq \
--snp_column SNP \
--effect_allele_column A1 \
--non_effect_allele_column A2 \
--beta_column b \
--pvalue_column p \
--output_file sample_data/results/${tissue}.csv
done

5.3. Calculate the joint tissue covariance

mkdir covariance_GTEx_external ## path for saving new covariance matrix (could take ~25GB space)
mkdir sample_data/weight_db_GTEx_external ## path for saving imputation models across different tissues
cp sample_data/weight_db_GTEx/* sample_data/weight_db_GTEx_external/
cp sample_data/weight_db_external/* sample_data/weight_db_GTEx_external/
python2 ./joint_covariance.py \
--weight_db sample_data/weight_db_GTEx_external/ \
--input_folder sample_data/dosage/ \
--covariance_output sample_data/covariance_GTEx_external/

The example command parameters:

  • --verbosity

    Log verbosity level. 1 means everything will be logged. 10 means high level messages will be logged.

  • --weight_db

    Name of weight db in data folder.

  • --input_folder

    Name of folder containing dosage data.

  • --covariance_output

    Path where covariance results will be saved to.

  • --min_maf_filter

    Filter SNPs according to this maf.

  • --max_maf_filter

    Filter SNPs according to this maf.

5.4. Combine gene-trait associations in 44 tissues + STARNET liver eQTL + BLUEPRINT eQTL/sQTL by joint GBJ test

## note after 5.2, sample_data/results/ now contains 44 + 1 + 3*2 single-tissue association results
UTMOST_path=/absolute/path/to/UTMOST/
$ mkdir results_GTEx_external
$ python2 joint_GBJ_test.py \
--weight_db $UTMOST_path/sample_data/weight_db_GTEx_external/ \
--output_dir $UTMOST_path/results_GTEx_external/ \
--cov_dir $UTMOST_path/covariance_GTEx_external/ \
--input_folder $UTMOST_path/sample_data/results/ \
--gene_info $UTMOST_path/intermediate/gene_info.txt \
--output_name test_GTEx_external

Conditional Analysis

python2 conditional_test_geneset.py \
--utmost_dir $UTMOST_path \
--weight_db $UTMOST_path/sample_data/weight_db_GTEx/ \
--input_folder $UTMOST_path/sample_data/dosage/ \
--gwas_str gwas \
--gwas_folder /GWAS_data/ \
--gwas_file_pattern GWAS.txt \
--snp_column SNP \
--effect_allele_column A1 \
--non_effect_allele_column A2 \
--beta_column BETA \
--se_column SE \
--pvalue_column P \
--chr_idx 1 \
--gene_list  Gene1,Gene2,Gene3 \
--output_dir /output_dir/

The example command parameters:

  • --utmost_dir

    Path to UTMOST.

  • --weight_db

    Path to gene expression imputation model (estimated weights/effect sizes of cis-eQTLs).

  • --input_folder

    Name of folder containing dosage data.

  • --gwas_str

    GWAS name string.

  • --gwas_folder

    Folder containing GWAS summary statistics data.

  • --gwas_file_pattern

    The file patten of gwas file (file name of summary statistics if not segmented by chromosomes).

  • --snp_column

    Argument with the name of the column containing the RSIDs.

  • --effect_allele_column

    Argument with the name of the column containing the effect allele.

  • --non_effect_allele_column

    Argument with the name of the column containing the non-effect allele.

  • --beta_column

    The column containing -effect size estimator for each SNP- in the input GWAS files.

  • --se_column

    The column containing the standard error for effect size estimate.

  • --pvalue_column

    The column containing -PValue for each SNP- in the input GWAS files.

  • --chr_idx

    Chromosome number.

  • --gene_list

    Genes to be tested.

  • --output_dir

    Path where results will be saved to.

Acknowledgement

Part of the code is modified from MetaXcan https://github.com/hakyimlab/MetaXcan. We thank the authors for sharing the code.

Reference

Hu et al. (2018). A statistical framework for cross-tissue transcriptome-wide association analysis. bioRxiv, 286013. Link

About

UTMOST (unified test for molecular signatures) is a method for cross-tissue gene expression imputation for transcriptome-wide association analyses.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.8%
  • R 0.2%