ChromSpec

Multivariate logistic regression for multi-label classification of analytes of interest using gas chromatography–mass spectrometry (GCMS) data and an in-house collected library. Context of data collected:

3 dimensional data (x, y, z) axes
Specific mass spectra (m/z) and total ion current (TIC) distribution at a specific retention time (RT) allows for identification of compound

Graphical illustration of the 3 dimensions of data collected can be seen below:

Main parts of the code include:

Preprocessing of data (data extraction from raw data files, normalisation within same scan number, one-hot encoding for classification of in-house library, etc)
Feature engineering of data (time-bin encoding -> 35 minutes run duration / 0.25 time interval -> 140 time bins * 300 m/z -> 42000 features per row per sample)
Modelling using logistic regression

Features of code include:

Creating a new model from data
Using an existing trained model
Adding more data to existing models

An example of the generated report can be seen below where each compound's models will give a probability of their respective compound existing in the sample: Disclaimer: In-house data was removed entirely and compound names replaced with arbitary names to ensure privacy of contents.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
MAIN CODE		MAIN CODE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChromSpec

About

Releases

Packages

Languages

nigelmaxwee/ChromSpec

Folders and files

Latest commit

History

Repository files navigation

ChromSpec

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages