Skip to content

Commit 901cf3f

Browse files
committed
update vignettes.
1 parent 8f0c567 commit 901cf3f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+532
-631
lines changed

README.Rmd

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: scglmmr *S*ample-level Single-*C*ell *G*eneralized *L*inear *M*ultilevel *M*odels in *R*
2+
title: Sample-level Single Cell GLMMs in R
33
output: github_document
44
---
55

@@ -13,7 +13,9 @@ knitr::opts_chunk$set(
1313
)
1414
```
1515

16-
An R package for implementing mixed effects models on single cell data with complex experiment designs. The package is flexible and can accomodate many experiment designs. It was developed for analysis of multimodal single cell data from many individuals assayed pre and post perturbation such as drug treatment, where each individual is nested within one or more response groups. The methods herein allow one to compare the difference in perturbation response effects between groups while modeling variation in donor expression. It also has many wrappers for downstream enrichment testing and visualization.
16+
**This package is under active development**
17+
18+
An R package for implementing mixed effects modeling methods on single cell data that can accomodate many different complex experiment designs. The package is built around [lme4](https://www.jstatsoft.org/article/view/v067i01) and was originally made for analysis of single cell data collected from many individuals who are assayed pre- and post- perturbation such as drug treatment, nested within one or more response groups. The methods herein allow one to compare the difference in perturbation response effects between groups while modeling variation in donor expression. It also has many wrappers for downstream enrichment testing and visualization.
1719

1820
Please see vignettes.
1921

@@ -26,13 +28,11 @@ library(scglmmr)
2628
<img src="man/figures/scglmmr.overview.png" />
2729

2830

29-
**With this type of experiment design, we can't just color umap plots and try to find the effects.** We need statistical models.
30-
3131
## Single cell within cluster perturbation response differential expression
3232

33-
The purpose of this software is to analyze single cell genomics data with pre and post perturbation measurements from the same individuals, including complex designs wehre individuals with repeated measurements are nested within in multiple response groups. The focus is on implementing flexible generalized linear multilevel models to derive group (i.e. good or poor clinical outcome, high or low rug response) and treatment associated effects *within cell types* defined either by protein (e.g. with CITE-seq data) or transcriptome based clustering followed by downstream enrichment testing and visualization.
33+
The purpose of this software is to analyze single cell genomics data with pre and post perturbation measurements from the same individuals, including complex designs with many subjects, each subject having repeated measurements pre and post perturbation and each subject nested within in different groups, such as different end point response correlates, e.g. high and low responders. The focus is on implementing flexible generalized linear multilevel models to derive group (i.e. good or poor clinical outcome, high or low rug response) and treatment associated effects *within cell types* defined either by protein (e.g. with CITE-seq data) or transcriptome based clustering followed by downstream enrichment testing and visualization.
3434

35-
By default, the effect of treatment/perturbation across all subjects, the baseline differences between outcome groups, and the difference in the treatment effect between the outcome groups are tested. Any number of model covariates can be specified and by default the package uses a random intercept model to accomodate the non-independence of expression within each subject.
35+
Any number of model covariates can be specified. The vignettes provide methods where a random intercept term for teh donor ID of each cell oraggregated library is incluided in the model. These methods thus model the vriation around the baseline expression across individuals, accomodating non-independence of expression for repeated timepoints from each subject.
3636

3737
An overview of methods provided:
3838

@@ -46,13 +46,15 @@ Test perturbation effect using a gene level Poisson mixed model.
4646
Test perturbation effects and differences in perturbation responses between groups at the gene module level.
4747

4848
### 4. Downstream enrichment testing and visualization
49-
Wrappers around methods from the [fast set gene enrichment (fgsea)](https://www.biorxiv.org/content/10.1101/060012v2#disqus_thread) and [clusterProfiler](https://www.bioconductor.org/packages/release/bioc/html/clusterProfiler.html) R packages.
49+
There are wrapper functions around multiple gene set enrichment methods, with emphasis on the [fast set gene enrichment (fgsea) package](https://www.biorxiv.org/content/10.1101/060012v2#disqus_thread). The results from fgsea can then be further interrogated by methods for contrasting information content in genes driving enrichments within and between celltypes. Multiple visualization wrappers are also provided.
5050

5151
### Philosophy
52-
The scglmmr package considers each cluster/ cell type as a separate 'experiment' similar to fitting separate models to different FACS sorted leukocyte subsets followed by RNAseq. Fitting models separately to each subset provides maximum flexibility and avoids issues with e.g. modeling mean variance trends or count distributions for cell type specific genes in subsets that do not express the gene while still enabling comparison of, for example, coherent perturbation effects for the same gene across individuals between different cell clusters. This approach is particularly well suited for CITE-seq data with cells clustered based on normalized protein expression levels. Typically our workflow consists of denoising ADT data using our method [dsb](https://github.com/niaid/dsb) followed by modeling the group level perturbation response effects using scglmmr.
52+
This package models expression within each cluster/ cell type independently in order to capture perturbation effects of cell type specific genes as well as genes that are expressed by multiple cell types. Using a normal distribution on count data requires first modeling the mean variance trend [(see Law et al)](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29) this requires filtering features (genes) that are not expressed by a iven cell type. These cell type specific transcripts are therefore tested for perturbation effects within only in the cell types that express the genes, instead of across all cell types. Genes that are shared across cell types can be conpared for coherent perturbation effects across all subjects or between different groups of subjects using contrast coding.
53+
54+
This approach is particularly well suited for multimodal single cell data where cells are clustered based a independent information from the perturbation effects. For example, we have utilized methods in this package for CITE-seq data where we first denoise ADT data using our method [dsb](https://github.com/niaid/dsb) followed by modeling transcriptome not for differences between cell types, but for the group level perturbation response effects using this package.
5355

5456
**Experiment designs (within each cluster / celltype) supported by scglmmr**
55-
Below is a 2 group repeated measures experiment. This data can be accomodated by scglmmr. More simple experiment designs are also supported, for example data with 2 groups but not repeated pre/post treatment measurements.
57+
Below is a 2 group repeated measures experiment. This data can be accomodated by scglmmr. More simple experiment designs are also supported, for example 2 groups with one timepoint or more complex experiments for example 3 timepoints.
5658

5759
| sample | sampleid | timepoint | Group | sex
5860
| :------------- | :----------: | -----------: | :------------- | :----------: |

README.md

Lines changed: 58 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,20 @@
1-
scglmmr *S*ample-level Single-*C*ell *G*eneralized *L*inear *M*ultilevel
2-
*M*odels in *R*
1+
Sample-level Single Cell GLMMs in R
32
================
43

54
<!-- README.md is generated from README.Rmd. Please edit that file -->
65

7-
An R package for implementing mixed effects models on single cell data
8-
with complex experiment designs. The package is flexible and can
9-
accomodate many experiment designs. It was developed for analysis of
10-
multimodal single cell data from many individuals assayed pre and post
11-
perturbation such as drug treatment, where each individual is nested
12-
within one or more response groups. The methods herein allow one to
13-
compare the difference in perturbation response effects between groups
14-
while modeling variation in donor expression. It also has many wrappers
15-
for downstream enrichment testing and visualization.
6+
**This package is under active development**
7+
8+
An R package for implementing mixed effects modeling methods on single
9+
cell data that can accomodate many different complex experiment designs.
10+
The package is built around
11+
[lme4](https://www.jstatsoft.org/article/view/v067i01) and was
12+
originally made for analysis of single cell data collected from many
13+
individuals who are assayed pre- and post- perturbation such as drug
14+
treatment, nested within one or more response groups. The methods herein
15+
allow one to compare the difference in perturbation response effects
16+
between groups while modeling variation in donor expression. It also has
17+
many wrappers for downstream enrichment testing and visualization.
1618

1719
Please see vignettes.
1820

@@ -25,27 +27,26 @@ library(scglmmr)
2527

2628
<img src="man/figures/scglmmr.overview.png" />
2729

28-
**With this type of experiment design, we can’t just color umap plots
29-
and try to find the effects.** We need statistical models.
30-
3130
## Single cell within cluster perturbation response differential expression
3231

3332
The purpose of this software is to analyze single cell genomics data
3433
with pre and post perturbation measurements from the same individuals,
35-
including complex designs wehre individuals with repeated measurements
36-
are nested within in multiple response groups. The focus is on
37-
implementing flexible generalized linear multilevel models to derive
38-
group (i.e. good or poor clinical outcome, high or low rug response) and
39-
treatment associated effects *within cell types* defined either by
40-
protein (e.g. with CITE-seq data) or transcriptome based clustering
41-
followed by downstream enrichment testing and visualization.
42-
43-
By default, the effect of treatment/perturbation across all subjects,
44-
the baseline differences between outcome groups, and the difference in
45-
the treatment effect between the outcome groups are tested. Any number
46-
of model covariates can be specified and by default the package uses a
47-
random intercept model to accomodate the non-independence of expression
48-
within each subject.
34+
including complex designs with many subjects, each subject having
35+
repeated measurements pre and post perturbation and each subject nested
36+
within in different groups, such as different end point response
37+
correlates, e.g. high and low responders. The focus is on implementing
38+
flexible generalized linear multilevel models to derive group (i.e. good
39+
or poor clinical outcome, high or low rug response) and treatment
40+
associated effects *within cell types* defined either by protein
41+
(e.g. with CITE-seq data) or transcriptome based clustering followed by
42+
downstream enrichment testing and visualization.
43+
44+
Any number of model covariates can be specified. The vignettes provide
45+
methods where a random intercept term for teh donor ID of each cell
46+
oraggregated library is incluided in the model. These methods thus model
47+
the vriation around the baseline expression across individuals,
48+
accomodating non-independence of expression for repeated timepoints from
49+
each subject.
4950

5051
An overview of methods provided:
5152

@@ -72,33 +73,42 @@ between groups at the gene module level.
7273

7374
### 4. Downstream enrichment testing and visualization
7475

75-
Wrappers around methods from the [fast set gene enrichment
76-
(fgsea)](https://www.biorxiv.org/content/10.1101/060012v2#disqus_thread)
77-
and
78-
[clusterProfiler](https://www.bioconductor.org/packages/release/bioc/html/clusterProfiler.html)
79-
R packages.
76+
There are wrapper functions around multiple gene set enrichment methods,
77+
with emphasis on the [fast set gene enrichment (fgsea)
78+
package](https://www.biorxiv.org/content/10.1101/060012v2#disqus_thread).
79+
The results from fgsea can then be further interrogated by methods for
80+
contrasting information content in genes driving enrichments within and
81+
between celltypes. Multiple visualization wrappers are also provided.
8082

8183
### Philosophy
8284

83-
The scglmmr package considers each cluster/ cell type as a separate
84-
‘experiment’ similar to fitting separate models to different FACS sorted
85-
leukocyte subsets followed by RNAseq. Fitting models separately to each
86-
subset provides maximum flexibility and avoids issues with e.g. modeling
87-
mean variance trends or count distributions for cell type specific genes
88-
in subsets that do not express the gene while still enabling comparison
89-
of, for example, coherent perturbation effects for the same gene across
90-
individuals between different cell clusters. This approach is
91-
particularly well suited for CITE-seq data with cells clustered based on
92-
normalized protein expression levels. Typically our workflow consists of
93-
denoising ADT data using our method [dsb](https://github.com/niaid/dsb)
94-
followed by modeling the group level perturbation response effects using
95-
scglmmr.
85+
This package models expression within each cluster/ cell type
86+
independently in order to capture perturbation effects of cell type
87+
specific genes as well as genes that are expressed by multiple cell
88+
types. Using a normal distribution on count data requires first modeling
89+
the mean variance trend [(see Law et
90+
al)](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-2-r29)
91+
this requires filtering features (genes) that are not expressed by a
92+
iven cell type. These cell type specific transcripts are therefore
93+
tested for perturbation effects within only in the cell types that
94+
express the genes, instead of across all cell types. Genes that are
95+
shared across cell types can be conpared for coherent perturbation
96+
effects across all subjects or between different groups of subjects
97+
using contrast coding.
98+
99+
This approach is particularly well suited for multimodal single cell
100+
data where cells are clustered based a independent information from the
101+
perturbation effects. For example, we have utilized methods in this
102+
package for CITE-seq data where we first denoise ADT data using our
103+
method [dsb](https://github.com/niaid/dsb) followed by modeling
104+
transcriptome not for differences between cell types, but for the group
105+
level perturbation response effects using this package.
96106

97107
**Experiment designs (within each cluster / celltype) supported by
98108
scglmmr** Below is a 2 group repeated measures experiment. This data can
99109
be accomodated by scglmmr. More simple experiment designs are also
100-
supported, for example data with 2 groups but not repeated pre/post
101-
treatment measurements.
110+
supported, for example 2 groups with one timepoint or more complex
111+
experiments for example 3 timepoints.
102112

103113
| sample | sampleid | timepoint | Group | sex |
104114
|:---------------|:--------------:|---------------:|:---------------|:--------------:|

_pkgdown.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,7 @@ destination: docs
22
template:
33
bootstrap: 5
44
bootswatch: sandstone
5-
theme: kate
6-
css: style.css
5+
theme: haddock
76
bslib:
87
pkgdown-nav-height: 100px
98
navbar:

docs/404.html

Lines changed: 2 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/articles/index.html

Lines changed: 4 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)