@@ -7,23 +7,36 @@ Metagenomic Sequencing", Zaramela et al., mSystems 2022)
77## Overview
88
99The package provides an API for generating absolute cell count estimates from
10- metagenomic sequencing data that includes synthetic DNA (synDNA) spike-ins.
11- The general approach, as outlined in Zaramela et al., is to use counts of
12- specific synDNAs present in known masses to fit a linear regression model for
13- each sample that predicts the gDNA mass of a sequence based on its counts in
14- that sample. These linear models are subsequently applied to the counts of
15- microbial genomes in each sample to calculate the mass of each such genome,
16- and those masses are then transformed into the number of instances of each
17- genome (that is, the approximate cell count of that microbe) in the input gDNA.
18- pySynDNA extends this approach by also calculating the approximate cell count
19- of each microbe in the input sample material (before gDNA extraction).
10+ metagenomic shotgun sequencing data that includes synthetic DNA (synDNA) spike-ins by
11+ implementing and extending the technique of [ Zaramela et al] ( https://pubmed.ncbi.nlm.nih.gov/36317886/ ) .
12+ The calculation is performed in two parts: first, the known
13+ spiked-in mass of pooled synDNAs and the known concentrations of each synDNA
14+ within the pool are used to calculate the mass of each synDNA sequenced from a given sample.
15+ These mass values are paired with the sequenced counts of each synDNA in the sample, and
16+ a regression model is fitted to predict mass from counts within that sample
17+ as shown in the figure below.
18+
19+ ![ pySynDNA regression fit workflow] ( https://raw.githubusercontent.com/biocore/pysyndna/main/docs/absolute_quant_fit_models_workflow.png?raw=true )
20+
21+ The counts for each microbial genome in the sample are then translated into
22+ masses via the regression model, and the masses are converted to genome counts
23+ using genome lengths and Avogadro's number
24+ (see Zaramela et al [ equation 2] ( https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9765022/#FD2 ) ).
25+ Assuming approximately one genome per cell, the calculated genome counts are
26+ treated as approximate cell counts. pySynDNA extends this approach further by
27+ using the known starting mass of each sample to calculate the approximate cell
28+ counts per gram of each microbe in the input sample material (before gDNA extraction).
29+ These calculations are outlined in the figure below.
30+
31+ ![ pySynDNA OGU cell counts workflow] ( https://raw.githubusercontent.com/biocore/pysyndna/main/docs/absolute_quant_calc_cell_counts_workflow.png?raw=true )
32+
2033
2134## Installation
2235
2336To install this package, first clone the repository from GitHub:
2437
2538```
26- git clone https://github.com/AmandaBirmingham /pysyndna.git
39+ git clone https://github.com/biocore /pysyndna.git
2740```
2841
2942Change directory into the new ` pysndna ` folder and create a
0 commit comments