Skip to content

How does IdentifiHR work?

Ashley Weir edited this page Aug 30, 2024 · 7 revisions

The IdentifiHR R package has several functions to support use, and can be used to predict HR status in a single sample, or across several samples. The model requires only a matrix or data frame of raw gene expression counts, with genes annotated with ensembl, hgnc or entrez identifiers.

The processCounts() function subsets the input matrix to only the genes required for predicition. It subsets counts to the 2604 genes required for normalisation and then transforms counts with log2 counts-per-million (CPM) to normalise for library size differences. Genes are then scaled using a z-score, whereby the mean and standard deviation are taken from our training dataset. As the mean and standard deviation are taken from our training cohort, this scaling must be performed by the processCounts() fucntion, and not be an alternate z-score function written in R.

Processed counts can then be used by the predictHr() function to infer HR status from the expression of only 209 genes.

The output of IdentifiHR is a data frame containing both a discrete prediction of HR status, being HR deficient ("HRD") OR HR proficient ("HRP"), in addition to the probability that a sample is HRD.

Package overview:

identifiHRPackageOverview

IdentifiHR is a predictive model of HR status in HGSC that uses only gene expression.

Clone this wiki locally