You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: rna_seq.md
+26-23Lines changed: 26 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,10 +3,10 @@
3
3
## Load salmon
4
4
5
5
```
6
-
module load salmon
6
+
module load Salmon
7
7
```
8
8
9
-
## Downloading the data.
9
+
## Downloading the data
10
10
11
11
For this tutorial we will use the test data from [this](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004393) paper:
12
12
@@ -27,7 +27,7 @@ So to summarize we have:
27
27
* HBR + ERCC Spike-In Mix2, Replicate 2
28
28
* HBR + ERCC Spike-In Mix2, Replicate 3
29
29
30
-
You can download the data from [here](http://139.162.178.46/files/tutorials/toy_rna.tar.gz)
30
+
You can download the data from [here](http://139.162.178.46/files/tutorials/toy_rna.tar.gz).
31
31
32
32
Unpack the data and go into the toy_rna directory
33
33
@@ -36,13 +36,13 @@ tar xzf toy_rna.tar.gz
36
36
cd toy_rna
37
37
```
38
38
39
-
## indexing transcriptome
39
+
## Indexing transcriptome
40
40
41
41
```
42
42
salmon index -t chr22_transcripts.fa -i chr22_index
43
43
```
44
44
45
-
## quantify reads using salmon
45
+
## Quantify reads using salmon
46
46
47
47
```bash
48
48
foriin*_R1.fastq.gz
@@ -64,9 +64,9 @@ Salmon exposes many different options to the user that enable extra features or
64
64
65
65
After the salmon commands finish running, you should have a directory named `quant`, which will have a sub-directory for each sample. These sub-directories contain the quantification results of salmon, as well as a lot of other information salmon records about the sample and the run. The main output file (called quant.sf) is rather self-explanatory. For example, take a peek at the quantification file for sample `HBR_Rep1` in `quant/HBR_Rep1/quant.sf` and you’ll see a simple TSV format file listing the name (Name) of each transcript, its length (Length), effective length (EffectiveLength) (more details on this in the documentation), and its abundance in terms of Transcripts Per Million (TPM) and estimated number of reads (NumReads) originating from this transcript.
66
66
67
-
## import read counts using tximport
67
+
## Import read counts using tximport
68
68
69
-
Using the tximport R package, you can import salmon’s transcript-level quantifications and optionally aggregate them to the gene level for gene-level differential expression analysis
69
+
Using the tximport R package, you can import salmon’s transcript-level quantifications and optionally aggregate them to the gene level for gene-level differential expression analysis.
70
70
71
71
First, open up your favourite R IDE and install the necessary packages:
72
72
@@ -86,7 +86,7 @@ library(GenomicFeatures)
86
86
library(readr)
87
87
```
88
88
89
-
Salmon did the quantifiation of the transcript level. We want to see which genes are differentially expressed, so we need to link the transcripts name to the gene names. We can use our .gtf annotation for that, and the GenomicFeatures package:
89
+
Salmon did the quantifiation of the transcript level. We want to see which genes are differentially expressed, so we need to link the transcript names to the gene names. We can use our .gtf annotation for that, and the GenomicFeatures package:
90
90
91
91
```R
92
92
txdb<- makeTxDbFromGFF("chr22_genes.gtf")
@@ -96,49 +96,48 @@ tx2gene <- df[, 2:1]
96
96
head(tx2gene)
97
97
```
98
98
99
-
now we can import the salmon quantification:
99
+
Now we can import the salmon quantification. First, download the file with sample descriptions from [here](https://raw.githubusercontent.com/HadrienG/tutorials/master/data/samples.txt) and put it in the toy_rna directory. Then, use that file to load the corresponding quantification data.
Instantiate the DESeqDataSet and generate result table. See ?DESeqDataSetFromTximport and ?DESeq for more information about the steps performed by the program.
129
-
128
+
Instantiate the DESeqDataSet and generate result table. See `?DESeqDataSetFromTximport` and `?DESeq` for more information about the steps performed by the program.
run the `summary` command to have an idea of how many genes are up and down-regulated between the two conditions
136
+
Run the `summary` command to get an idea of how many genes are up- and downregulated between the two conditions:
138
137
139
138
`summary(res)`
140
139
141
-
DESeq uses a negative binomial distribution. Such distribution has two parameters: mean and dispersion. The dispersion is a parameter describing how much the variance deviates from the mean.
140
+
DESeq uses a negative binomial distribution. Such distributions have two parameters: mean and dispersion. The dispersion is a parameter describing how much the variance deviates from the mean.
142
141
143
142
You can read more about the methods used by DESeq2 in the [paper](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0550-8) or the [vignette](https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq/inst/doc/DESeq.pdf)
We’re going to use the [gage](http://bioconductor.org/packages/release/bioc/html/gage.html) package for pathway analysis, and the [pathview](http://bioconductor.org/packages/release/bioc/html/pathview.html) package to draw a pathway diagram.
241
-
239
+
We’re going to use the [gage](https://bioconductor.org/packages/release/bioc/html/gage.html) package for pathway analysis, and the [pathview](https://bioconductor.org/packages/release/bioc/html/pathview.html) package to draw a pathway diagram.
242
240
243
241
The gageData package has pre-compiled databases mapping genes to KEGG pathways and GO terms for common organisms:
pull out the top 5 upregulated pathways, then further process that just to get the IDs. We’ll use these KEGG pathway IDs downstream for plotting.
259
+
Pull out the top 5 upregulated pathways, then further process that just to get the IDs. We’ll use these KEGG pathway IDs downstream for plotting. The `dplyr` package is required to use the pipe (`%>%`) construct.
Finally, the pathview() function in the pathview package makes the plots. Let’s write a function so we can loop through and draw plots for the top 5 pathways we created above.
277
+
Finally, the `pathview()` function in the pathview package makes the plots. Let’s write a function so we can loop through and draw plots for the top 5 pathways we created above.
0 commit comments