Skip to content

Commit 11e2fc9

Browse files
author
Jonas Ohlsson
committed
fix typos and add fix for the failing sapply command.
1 parent 77a77c4 commit 11e2fc9

File tree

1 file changed

+26
-23
lines changed

1 file changed

+26
-23
lines changed

rna_seq.md

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,10 @@
33
## Load salmon
44

55
```
6-
module load salmon
6+
module load Salmon
77
```
88

9-
## Downloading the data.
9+
## Downloading the data
1010

1111
For this tutorial we will use the test data from [this](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393) paper:
1212

@@ -27,7 +27,7 @@ So to summarize we have:
2727
* HBR + ERCC Spike-In Mix2, Replicate 2
2828
* HBR + ERCC Spike-In Mix2, Replicate 3
2929

30-
You can download the data from [here](http://139.162.178.46/files/tutorials/toy_rna.tar.gz)
30+
You can download the data from [here](http://139.162.178.46/files/tutorials/toy_rna.tar.gz).
3131

3232
Unpack the data and go into the toy_rna directory
3333

@@ -36,13 +36,13 @@ tar xzf toy_rna.tar.gz
3636
cd toy_rna
3737
```
3838

39-
## indexing transcriptome
39+
## Indexing transcriptome
4040

4141
```
4242
salmon index -t chr22_transcripts.fa -i chr22_index
4343
```
4444

45-
## quantify reads using salmon
45+
## Quantify reads using salmon
4646

4747
```bash
4848
for i in *_R1.fastq.gz
@@ -64,9 +64,9 @@ Salmon exposes many different options to the user that enable extra features or
6464

6565
After the salmon commands finish running, you should have a directory named `quant`, which will have a sub-directory for each sample. These sub-directories contain the quantification results of salmon, as well as a lot of other information salmon records about the sample and the run. The main output file (called quant.sf) is rather self-explanatory. For example, take a peek at the quantification file for sample `HBR_Rep1` in `quant/HBR_Rep1/quant.sf` and you’ll see a simple TSV format file listing the name (Name) of each transcript, its length (Length), effective length (EffectiveLength) (more details on this in the documentation), and its abundance in terms of Transcripts Per Million (TPM) and estimated number of reads (NumReads) originating from this transcript.
6666

67-
## import read counts using tximport
67+
## Import read counts using tximport
6868

69-
Using the tximport R package, you can import salmon’s transcript-level quantifications and optionally aggregate them to the gene level for gene-level differential expression analysis
69+
Using the tximport R package, you can import salmon’s transcript-level quantifications and optionally aggregate them to the gene level for gene-level differential expression analysis.
7070

7171
First, open up your favourite R IDE and install the necessary packages:
7272

@@ -86,7 +86,7 @@ library(GenomicFeatures)
8686
library(readr)
8787
```
8888

89-
Salmon did the quantifiation of the transcript level. We want to see which genes are differentially expressed, so we need to link the transcripts name to the gene names. We can use our .gtf annotation for that, and the GenomicFeatures package:
89+
Salmon did the quantifiation of the transcript level. We want to see which genes are differentially expressed, so we need to link the transcript names to the gene names. We can use our .gtf annotation for that, and the GenomicFeatures package:
9090

9191
```R
9292
txdb <- makeTxDbFromGFF("chr22_genes.gtf")
@@ -96,49 +96,48 @@ tx2gene <- df[, 2:1]
9696
head(tx2gene)
9797
```
9898

99-
now we can import the salmon quantification:
99+
Now we can import the salmon quantification. First, download the file with sample descriptions from [here](https://raw.githubusercontent.com/HadrienG/tutorials/master/data/samples.txt) and put it in the toy_rna directory. Then, use that file to load the corresponding quantification data.
100100

101101
```R
102102
samples <- read.table("samples.txt", header = TRUE)
103-
files <- file.path("quant", samples$quant, "quant.sf")
103+
files <- file.path("quant", samples$sample, "quant.sf")
104104
names(files) <- paste0(samples$sample)
105105
txi.salmon <- tximport(files, type = "salmon", tx2gene = tx2gene, reader = read_tsv)
106106
```
107107

108-
take a look at the data:
108+
Take a look at the data:
109109

110110
```R
111111
head(txi.salmon$counts)
112112
```
113113

114-
## differential expression using DeSeq2
114+
## Differential expression using DESeq2
115115

116-
install the necessary package
116+
Install the necessary package:
117117

118118
```R
119119
biocLite('DESeq2')
120120
```
121121

122-
then load it:
122+
Then load it:
123123

124124
```R
125125
library(DESeq2)
126126
```
127127

128-
Instantiate the DESeqDataSet and generate result table. See ?DESeqDataSetFromTximport and ?DESeq for more information about the steps performed by the program.
129-
128+
Instantiate the DESeqDataSet and generate result table. See `?DESeqDataSetFromTximport` and `?DESeq` for more information about the steps performed by the program.
130129

131130
```R
132131
dds <- DESeqDataSetFromTximport(txi.salmon, samples, ~condition)
133132
dds <- DESeq(dds)
134133
res <- results(dds)
135134
```
136135

137-
run the `summary` command to have an idea of how many genes are up and down-regulated between the two conditions
136+
Run the `summary` command to get an idea of how many genes are up- and downregulated between the two conditions:
138137

139138
`summary(res)`
140139

141-
DESeq uses a negative binomial distribution. Such distribution has two parameters: mean and dispersion. The dispersion is a parameter describing how much the variance deviates from the mean.
140+
DESeq uses a negative binomial distribution. Such distributions have two parameters: mean and dispersion. The dispersion is a parameter describing how much the variance deviates from the mean.
142141

143142
You can read more about the methods used by DESeq2 in the [paper](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8) or the [vignette](https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq/inst/doc/DESeq.pdf)
144143

@@ -237,8 +236,7 @@ res$name = mapIds(org.Hs.eg.db,
237236
head(res)
238237
```
239238

240-
We’re going to use the [gage](http://bioconductor.org/packages/release/bioc/html/gage.html) package for pathway analysis, and the [pathview](http://bioconductor.org/packages/release/bioc/html/pathview.html) package to draw a pathway diagram.
241-
239+
We’re going to use the [gage](https://bioconductor.org/packages/release/bioc/html/gage.html) package for pathway analysis, and the [pathview](https://bioconductor.org/packages/release/bioc/html/pathview.html) package to draw a pathway diagram.
242240

243241
The gageData package has pre-compiled databases mapping genes to KEGG pathways and GO terms for common organisms:
244242

@@ -249,7 +247,7 @@ kegg.sets.hs = kegg.sets.hs[sigmet.idx.hs]
249247
head(kegg.sets.hs, 3)
250248
```
251249

252-
Run the pathway analysis. See help on the gage function with ?gage. Specifically, you might want to try changing the value of same.dir
250+
Run the pathway analysis. See help on the gage function with `?gage`. Specifically, you might want to try changing the value of same.dir.
253251

254252
```R
255253
foldchanges = res$log2FoldChange
@@ -258,9 +256,11 @@ keggres = gage(foldchanges, gsets=kegg.sets.hs, same.dir=TRUE)
258256
lapply(keggres, head)
259257
```
260258

261-
pull out the top 5 upregulated pathways, then further process that just to get the IDs. We’ll use these KEGG pathway IDs downstream for plotting.
259+
Pull out the top 5 upregulated pathways, then further process that just to get the IDs. We’ll use these KEGG pathway IDs downstream for plotting. The `dplyr` package is required to use the pipe (`%>%`) construct.
262260

263261
```R
262+
library(dplyr)
263+
264264
# Get the pathways
265265
keggrespathways = data.frame(id=rownames(keggres$greater), keggres$greater) %>%
266266
tbl_df() %>%
@@ -274,12 +274,15 @@ keggresids = substr(keggrespathways, start=1, stop=8)
274274
keggresids
275275
```
276276

277-
Finally, the pathview() function in the pathview package makes the plots. Let’s write a function so we can loop through and draw plots for the top 5 pathways we created above.
277+
Finally, the `pathview()` function in the pathview package makes the plots. Let’s write a function so we can loop through and draw plots for the top 5 pathways we created above.
278278

279279
```R
280280
# Define plotting function for applying later
281281
plot_pathway = function(pid) pathview(gene.data=foldchanges, pathway.id=pid, species="hsa", new.signature=FALSE)
282282

283+
# Unload dplyr since it conflicts with the next line
284+
detach("package:dplyr", unload=T)
285+
283286
# plot multiple pathways (plots saved to disk and returns a throwaway list object)
284287
tmp = sapply(keggresids, function(pid) pathview(gene.data=foldchanges, pathway.id=pid, species="hsa"))
285288
```

0 commit comments

Comments
 (0)