This repository contains Matlab files and notebooks to reproduce results in clustering analysis and classification accuracy of various dimension reduction methods for biomedical data experiments. (see manuscript).
This repository contains the main contrastive-inverse-regression (CIR) algorithm in both Matlab and python. But since clustering analysis and classification analysis are conducted in Matlab, it is our intent for the purpose of reproducibility to use the Matlab file (CIR.m) when running Matlab files in the experiments folder. The python package is available as a feasible alternative.
Remark: the implementation of SIR involves the generalized eigen-decomposition, eig(), which may vary across versions or languages. Please make sure the eigenvalues are in descending order before taking the top d eigenvectors for SIR.
When running Matlab files, there are some functions listed below that should be installed in advance, and adding these installed file paths to the current directory where the Matlab files in experiments folder is executed on the local computer is also essential.
-
dbindex (Davies Bouldin index): download here (download all files)
-
UMAP (uniform manifold approximation and projection): download here(download all files)
Other Matlab files has been added to the repository for convinence and here are the references:
- CHI.m (Calinski-Harabasz Criterion)
- LDA (Linear Discriminant Analysis):
- SGPM (Oviedo, 2024)[2]
For Mouse Protein analysis
- 'Data_Cortex_Nuclear.csv' is available in the repository. Reference on Github abidlabs/contrastive here.
For Single Cell RNA Sequencing analysis
- 'pbmc_1_counts.csv' and 'pbmc_1_cell_type.csv' both available in the repository
For Plasma Retinol analysis
- 'Retinol.txt' is available in the repository
For COVID-19 analysis
- The raw files 'PBMC_COVID.subsample500cells.covid.h5ad' for foreground and 'PBMC_COVID.subsample500cells.ctrl.h5ad' for background is available in figshare
- This dataset is referenced here
- The preprocessed files (i.e., covid_preprocessed_fg,csv.zip) are available in the repository.
contrastive-inverse-regression is a python package provided the algorithm for contrastive inverse regression (CIR) for dimension reduction used in a supervised setting to find a low-dimensional representation by solving a nonconvex optimization problem on the Stiefel manifold.
Make sure you have numpy, pandas, and scipy install beforehand and the version of these packages compatible with cir. The easy way to install is using pip
:
pip install contrastive-inverse-regression
Alternatively, you can also install by cloning this repository:
pip install git+https://github.com/myueen/cir.git
- Python (>= 3.10.9)
- numpy (>= 1.24.3)
- pandas (>= 2.1.4)
- scipy (>= 1.9.3)
To run exmaple, matplotlib (>= 3.8.2) is required
If you find this algorithm helpful in your research, please add the following bibtex citation in references.
@phdthesis{hawke2023contrastive,
title={Contrastive inverse regression for dimension reduction},
author={Hawke, Sam and Luo, Hengrui and Li, Didong},
journal={arXiv preprint arXiv:2305.12287},
year={2023}
}
.. [1] : Hawke, S., Luo, H., & Li, D. (2023) "Contrastive Inverse Regression for Dimension Reduction", Retrieved from https://arxiv.org/abs/2305.12287
.. [2] Harry Oviedo (2024). SGPM for minimization over the Stiefel Manifold (https://www.mathworks.com/matlabcentral/fileexchange/73505-sgpm-for-minimization-over-the-stiefel-manifold), MATLAB Central File Exchange. Retrieved January 12, 2024.