-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy path01_SummarizedExperiment.Rmd
More file actions
193 lines (135 loc) · 4.33 KB
/
01_SummarizedExperiment.Rmd
File metadata and controls
193 lines (135 loc) · 4.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
# SummarizedExperiment overview
Instructor: Leo
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r vignetteSetup_SEreview, echo=FALSE, message=FALSE, warning = FALSE}
## For links
library(BiocStyle)
## Bib setup
library(RefManageR)
## Write bibliography information
bib <- c(
smokingMouse = citation("smokingMouse")[1],
SummarizedExperiment = citation("SummarizedExperiment")[1]
)
options(max.print = 50)
```
<iframe width="560" height="315" src="https://www.youtube.com/embed/lqxtgpD-heM" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
[_LIBD rstats club notes_](https://docs.google.com/document/d/1umDODmdQldf5w2lNDoFe-unmezHPonpCiKD270VwkrQ/edit?usp=sharing)
## Overview
The `SummarizedExperiment` class is used to store experimental results in the form of matrixes. Objects of this class include observations (features) of the samples, as well as additional metadata. Usually, this type of object is automatically generated as the output of other software (ie. [`SPEAQeasy`](https://doi.org/10.1186/s12859-021-04142-3)), but you can also build them.
One of the main characteristics of `SummarizedExperiment` is that it allows you to handle you data in a "coordinated" way. For example, if you want to subset your data, with `SummarizedExperiment` you can do so without worrying about keeping your assays and metadata synched.
<figure>
<img src="Figures/se_structure.png" width="700px" align=center />
</figure>
## Exercises
We are gonna use the sample data set from the `airway` library.
```{r, echo=FALSE}
suppressPackageStartupMessages(library("SummarizedExperiment"))
suppressPackageStartupMessages(data(airway, package = "airway"))
```
```{r}
library("SummarizedExperiment")
library("airway")
data(airway, package = "airway")
se <- airway
```
<style>
p.exercise {
background-color: #E4EDE2;
padding: 9px;
border: 1px solid black;
border-radius: 10px;
font-family: sans-serif;
}
</style>
<p class="exercise">
**Exercise 1**:
**a)** How many genes do we have in this object? And samples?
**b)** How many samples come from donors treated (`trt`) with dexamethasone (`dex`)?
</p>
```{r}
## For a) you could only print the summary of the object but since the idea is
## to understand how to explore the object find other function that gives
## you the answer.
se
## Same thing for b, you could just print the colData and count the samples,
## but this is not efficient when our data consists in hundreds of samples.
## Find the answer using other tools.
colData(se)
```
<p class="exercise">
**Exercise 2**:
Add another assay that has the log10 of your original counts
</p>
```{r}
## In our object, if you look at the part that says assays, we can see that
## at the moment we only have one with the name "counts".
se
## To see the data that's stored in that assay you can do either one of the
## next commands.
assay(se)
assays(se)$counts
## Note that assay() does not support $ operator
# assay(se)$counts
## We would have to do:
assay(se, 1)
assay(se, "counts")
## If you use assays() without specifying the element you want to see it
## shows you the length of the list and the name of each element.
assays(se)
## To obtain a list of names as a vector you can use:
assayNames(se)
## Which can also be use to change the name of the assays
assayNames(se)[1] <- "foo"
assayNames(se)
assayNames(se)[1] <- "counts"
```
<p class="exercise">
**Exercise 3**:
Explore the metadata and add a new column that has the library size of each sample.
</p>
```{r}
## To calculate the library size use
apply(assay(se), 2, sum)
```
## Solutions
<style>
p.solution {
background-color: #C093D6;
padding: 9px;
border: 1px solid black;
border-radius: 10px;
font-family: sans-serif;
}
</style>
<p class="solution">
**Solution 1**:
</p>
```{r}
## For a), dim() gives the desired answer
dim(se)
## For b),
colData(se)[colData(se)$dex == "trt", ]
colData(se)[se$dex == "trt", ]
```
<p class="solution">
**Solution 2**:
</p>
```{r}
## There are multiple ways to do it
assay(se, "logcounts") <- log10(assay(se, "counts"))
assays(se)$logcounts_v2 <- log10(assays(se)$counts)
```
<p class="solution">
**Solution 3**:
</p>
```{r}
## To add the library size we an use..
colData(se)$library_size <- apply(assay(se), 2, sum)
names(colData(se))
```