You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-7Lines changed: 9 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,23 +8,25 @@ This pipeline is designed to **accurately quantify gene and transcript abundance
8
8
9
9
As illustrated above, the pipeline consists of three stages:
10
10
11
-
####1. Preprocessing ####
11
+
### 1. Preprocessing
12
12
13
-
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These cleaned files are optimized for downstream quantification analysis.
13
+
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These preprocessed files are optimized for downstream quantification analysis.
14
14
15
-
####2. Quantification
15
+
### 2. Quantification
16
16
17
17
In this stage, the pipeline quantifies the abundance of both genes and transcripts. It supports three well-established and widely-used quantifiers:
18
18
19
19
-[**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): An **alignment-free quantifier** known for its **wicked-fast speed** and **comarable accuracy**.
20
20
21
-
-[**RSEM**](https://github.com/bli25/RSEM_tutorial): An **alignment-based quantifier** with **exceptional accuracy**. It has been used as **gold standard** in many benchmarking studies.
22
21
23
-
-[**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): An **alignment-based quantifier**featured by **splice-aware alignment**. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
22
+
-[**RSEM**](https://github.com/bli25/RSEM_tutorial): An **alignment-based quantifier**with **exceptional accuracy**. It has been used as **gold standard** in many benchmarking studies.
24
23
25
-
#### 3. Summarization
26
24
27
-
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and a master gene expression matrix, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
25
+
-[**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): An **alignment-based quantifier** featured by **splice-aware alignment**. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
26
+
27
+
### 3. Summarization
28
+
29
+
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and **a master gene expression matrix** including all samples, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
Copy file name to clipboardExpand all lines: index.md
+38-23Lines changed: 38 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,62 +15,77 @@ This pipeline is designed to **accurately quantify gene and transcript abundance
15
15
16
16
As illustrated above, the pipeline consists of three stages:
17
17
18
-
1. Preprocessing
18
+
### 1. Preprocessing
19
19
20
-
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These cleaned files are optimized for downstream quantification analysis.
20
+
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These preprocessed files are optimized for downstream quantification analysis.
21
21
22
-
2. Quantification
22
+
### 2. Quantification
23
23
24
-
In this stage, the pipeline quantifies the abundance of both genes and transcripts. It supports three well-established and widely-used quantifiers:
24
+
In this stage, the pipeline quantifies the abundance of both genes and transcripts. It supports three well-established and widely-used quantifiers:
25
25
26
-
-[**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): An **alignment-free quantifier** known for its **wicked-fast speed** and **comarable accuracy**.
26
+
-[**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): An **alignment-free quantifier** known for its **wicked-fast speed** and **comarable accuracy**.
27
27
28
28
29
29
-[**RSEM**](https://github.com/bli25/RSEM_tutorial): An **alignment-based quantifier** with **exceptional accuracy**. It has been used as **gold standard** in many benchmarking studies.
30
30
31
31
32
32
-[**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): An **alignment-based quantifier** featured by **splice-aware alignment**. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
33
33
34
+
### 3. Summarization
34
35
35
-
3. Summarization
36
-
37
-
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and a master gene expression matrix, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
36
+
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and **a master gene expression matrix** including all samples, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
38
37
39
38
40
39
41
40
## Key features
42
41
43
-
**Accuracy ensured by cross-validation**: This pipeline quantifies the transcriptome using both alignment-free method (Salmon) and alignment-based method (RSEM_STAR). It then performs a correlation analysis on the quantification results by these two approaches. A strong correlation (coefficient > 0.9) typically indicates high quantification accuracy.
42
+
### **1. Accuracy ensured by cross-validation**
43
+
44
+
This pipeline quantifies the transcriptome using both **alignment-free method** (Salmon) and **alignment-based method** (RSEM_STAR). It then performs correlation analysis between the quantification results from these two approaches. A strong correlation (coefficient > 0.9) typically indicates high quantification accuracy; while samples with low correlation coefficients may require troubleshooting.
45
+
46
+
### **2. Comprehensive quality control report**
47
+
48
+
For each sample, this pipeline generates [**a comprehensive quantlity control report**](https://github.com/jyyulab/bulkRNAseq_quantification_pipeline/blob/main/testdata/summarization_individual.html), summarizing alignment statistics, quantification correlations, gene type distributions, and gene body converage metrics, and more (see example below). These metrics are invaluable for **asseesing quantification accuracy** and **troubleshooting potential issues**.
**Comprehensive quality control report**: For each sample, this pipeline generates a comprehensive quantlity control report, summarizing alignment statistics, quantification correlations, gene type distributions, and gene body converage metrics, and more. These metrics are invaluable for asseesing quantification accuracy and troubleshooting potential issues.
52
+
### **3. Flater, Simpler, Faster**
46
53
47
-
**Flater, Simpler, Faste**r: Every step of the pipeline has been optimized for ease of use, maintenance and speed:
54
+
Every step of the pipeline has been optimized for ease of use, maintenance and speed:
48
55
49
-
- All required toolsnow can be installed within one single conda environment.
56
+
-**All required tools, databases and scripts now can be set up in a single conda environment.**
50
57
51
-
- Time-consuming steps, such as gene body coverage analysis, has been optimized. Now a typical run completes in about 2.5 hours.
58
+

52
59
53
-
-There are only two arguments that the users need to specify manually. For all the rest, including the adapter sequences and strandness types, the pipeline can infer them automatically.
60
+
-Time-consuming steps, such as gene body coverage analysis, has been optimized. **Now a typical run completes in about 2.5 hours**.
54
61
55
-
- All required from the user is a sample table (see example below). This make it effortless to process hundreds or thousands of samples using this pipeline.
62
+

63
+
64
+
-**Only two parameters (Library Type and Phred Score Encoding Method) need to be specified manually**; all other settings, including adapter sequences and strandness, are automatically inferred.
65
+
66
+
-**The pipeline now is highly user-friendly for large-scale analyses**. Instead of relying on loops or other workarounds to process multiple samples, the pipeline now accepts **a sample table** (see example below) as standard input and automatically parse it and extract all required information. This makes it **effortless to process hundreds or thousands of samples**.
- If you are unable to access the conda environment below, or if you need a reference genome assembly other than the pre-built ones (**hg38**, **hg19**, **mm39**, **mm10**), you will need to set up your own pipeline first. For detailed instructions, please refer to this tutorial: [Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/index).
72
+
We have set up a conda environment for this pipeline, with all **tools**, **databases** (for hg38, hg19, mm39 and mm10) and **scripts** ready to use. You can activate it using the following commands:
- If you are unable to access the conda environment above, or if you need a reference genome assembly other than the pre-built ones (**hg38**, **hg19**, **mm39**, **mm10**), you will need to set up your own pipeline first. For detailed instructions, please refer to this tutorial: [Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/index).
- If you are new to bulk RNA-seq quantification analysis and would like to learn more about the pipeline in detail, please refer to this tutorial: [Full Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/3_full_tutorial/index).
83
+
-**If you are new to bulk RNA-seq quantification analysis**, or would like to explore the pipeline in detail, please refer to this tutorial: [Full Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/3_full_tutorial/index).
69
84
70
-
- If you want to run this pipeline directly with your ownsamples, please refer to this tutorial: [Quick Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/2_quick_tutorial/quick_tutorial).
85
+
-**If you are already familiar with this pipeline**, you can quickly run it with your own samples by following this tutorial: [Quick Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/2_quick_tutorial/quick_tutorial).
71
86
72
87
73
88
74
89
## Contact
75
90
76
-
If you need support or have any questions about using this pipeline, please visit the [FAQ](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/4_FAQ/FAQ) or contact us directly at Qingfei.Pan@stjude.org.
91
+
If you need support or have any questions about using this pipeline, please visit the **[FAQ](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/4_FAQ/FAQ)** or contact us directly at **Qingfei.Pan@stjude.org**.
0 commit comments