Skip to content
/ mcap Public

Model-based clustering in very high dimensions via adaptive projections.

Notifications You must be signed in to change notification settings

btaschler/mcap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Bernd Taschler
Feb 12, 2019
47c7732 · Feb 12, 2019

History

49 Commits
Feb 11, 2019
Feb 11, 2019
Feb 11, 2019
Jan 18, 2019
Jul 18, 2018
Jan 18, 2019
Feb 11, 2019
Jan 18, 2019
Feb 11, 2019
Jan 17, 2019
Jan 17, 2019
Feb 11, 2019
Feb 12, 2019
Feb 12, 2019
Jan 18, 2019

Repository files navigation

Build Status

MCAP

mcap provides a model-based clustering approach in very high dimensions (especially when p is much larger than n) via adaptive projections. Clustering is based on full variances Gaussian mixture modelling in a lower dimensional (projected) space. The projection dimension is set adaptively in a data-driven manner based on a cluster stability criterion. Available projection variants (so far) include PCA and random Projections (Gaussian as well as sparse methods).

Resources

See our paper: currently under review

preprint: …

Getting Started

Clone or download the code from github.

Alternatively, you can install mcap directly from github with:

# install.packages("devtools")
devtools::install_github("btaschler/mcap")

Prerequisites

Dependencies on other packages:

  • for parallelisation: foreach, doParallel, parallel

  • for clustering: pcaMethods, nethet, mclust, RandPro, kernlab

  • misc: iterators, magrittr, stats, dplyr, tidyverse, utils, methods, data.table, RevoUtilsMath

Quick demo

This is a basic example showing how to use mcap to cluster two (known) groups:

library(mcap)

### basic example code
K <- 2       #number of clusters (groups)
n_k <- 200   #number of samples per group
p <- 1000    #number of features (dimension)
A <- matrix(rnorm(n_k*p), n_k, p)            #data for group 1
B <- matrix(rnorm(n_k*p, mean = 1), n_k, p)  #data for group 2
X <- rbind(A, B)                   #input matrix
Y <- c(rep(0, n_k), rep(1, n_k))   #known labels
           
## using PCA projections
model_fit <- MCAPfit(X, k = K, projection = 'PCA', centering_per_group = FALSE,
                     true_labels = Y, parallel = TRUE)

## sparse random projection
model_fit <- MCAPfit(X, k = K, projection = 'li', centering_per_group = FALSE,
                     true_labels = Y, parallel = TRUE)

## adjusted Rand index
print(model_fit$fit_gmm$aRI)

## display assigned cluster labels for each sample
print(model_fit$fit_gmm$model_fit$comp)

## show optimised projection dimension
print(model_fit$fit_q_opt$q_opt)

Versioning

For all available versions, see releases. We use Semantic Versioning.

Authors

List of contributors.

License

This project is licensed under the GNU General Public License – see the GPL-3.0 for details.

Acknowledgments

About

Model-based clustering in very high dimensions via adaptive projections.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages