Skip to content

Commit 144c060

Browse files
committed
finalize index.md
1 parent 56cdaaf commit 144c060

4 files changed

Lines changed: 47 additions & 30 deletions

File tree

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,23 +8,25 @@ This pipeline is designed to **accurately quantify gene and transcript abundance
88

99
As illustrated above, the pipeline consists of three stages:
1010

11-
#### 1. Preprocessing ####
11+
### 1. Preprocessing
1212

13-
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These cleaned files are optimized for downstream quantification analysis.
13+
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These preprocessed files are optimized for downstream quantification analysis.
1414

15-
#### 2. Quantification
15+
### 2. Quantification
1616

1717
In this stage, the pipeline quantifies the abundance of both genes and transcripts. It supports three well-established and widely-used quantifiers:
1818

1919
- [**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): An **alignment-free quantifier** known for its **wicked-fast speed** and **comarable accuracy**.
2020

21-
- [**RSEM**](https://github.com/bli25/RSEM_tutorial): An **alignment-based quantifier** with **exceptional accuracy**. It has been used as **gold standard** in many benchmarking studies.
2221

23-
- [**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): An **alignment-based quantifier** featured by **splice-aware alignment**. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
22+
- [**RSEM**](https://github.com/bli25/RSEM_tutorial): An **alignment-based quantifier** with **exceptional accuracy**. It has been used as **gold standard** in many benchmarking studies.
2423

25-
#### 3. Summarization
2624

27-
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and a master gene expression matrix, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
25+
- [**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): An **alignment-based quantifier** featured by **splice-aware alignment**. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
26+
27+
### 3. Summarization
28+
29+
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and **a master gene expression matrix** including all samples, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
2830

2931
## Tutorial
3032

docs/figures/file_structure.png

923 KB
Loading

docs/figures/task_duration.png

798 KB
Loading

index.md

Lines changed: 38 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -15,62 +15,77 @@ This pipeline is designed to **accurately quantify gene and transcript abundance
1515

1616
As illustrated above, the pipeline consists of three stages:
1717

18-
1. Preprocessing
18+
### 1. Preprocessing
1919

20-
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These cleaned files are optimized for downstream quantification analysis.
20+
The pipeline accepts raw input files in variable formats (e.g., FASTQ, BAM/SAM) and processes them to generate **standard-in-format**, **clean-in-sequence** FASTQ files. These preprocessed files are optimized for downstream quantification analysis.
2121

22-
2. Quantification
22+
### 2. Quantification
2323

24-
In this stage, the pipeline quantifies the abundance of both genes and transcripts. It supports three well-established and widely-used quantifiers:
24+
In this stage, the pipeline quantifies the abundance of both genes and transcripts. It supports three well-established and widely-used quantifiers:
2525

26-
- [**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): An **alignment-free quantifier** known for its **wicked-fast speed** and **comarable accuracy**.
26+
- [**Salmon**](https://salmon.readthedocs.io/en/latest/salmon.html): An **alignment-free quantifier** known for its **wicked-fast speed** and **comarable accuracy**.
2727

2828

2929
- [**RSEM**](https://github.com/bli25/RSEM_tutorial): An **alignment-based quantifier** with **exceptional accuracy**. It has been used as **gold standard** in many benchmarking studies.
3030

3131

3232
- [**STAR**](https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf): An **alignment-based quantifier** featured by **splice-aware alignment**. This is the tool used by [GDC mRNA quantification analysis pipeline](https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/Expression_mRNA_Pipeline).
3333

34+
### 3. Summarization
3435

35-
3. Summarization
36-
37-
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and a master gene expression matrix, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
36+
The pipeline generates a comprehensive **HTML report** for each sample, detailing quantification results, alignment statistics, correlation analyses, gene body coverage visualizations, and more. For multiple samples, it produces **a unified summary report** and **a master gene expression matrix** including all samples, which can be directly utilized for downstream analyses such as [**NetBID**](https://github.com/jyyulab/NetBID).
3837

3938

4039

4140
## Key features
4241

43-
**Accuracy ensured by cross-validation**: This pipeline quantifies the transcriptome using both alignment-free method (Salmon) and alignment-based method (RSEM_STAR). It then performs a correlation analysis on the quantification results by these two approaches. A strong correlation (coefficient > 0.9) typically indicates high quantification accuracy.
42+
### **1. Accuracy ensured by cross-validation**
43+
44+
This pipeline quantifies the transcriptome using both **alignment-free method** (Salmon) and **alignment-based method** (RSEM_STAR). It then performs correlation analysis between the quantification results from these two approaches. A strong correlation (coefficient > 0.9) typically indicates high quantification accuracy; while samples with low correlation coefficients may require troubleshooting.
45+
46+
### **2. Comprehensive quality control report**
47+
48+
For each sample, this pipeline generates [**a comprehensive quantlity control report**](https://github.com/jyyulab/bulkRNAseq_quantification_pipeline/blob/main/testdata/summarization_individual.html), summarizing alignment statistics, quantification correlations, gene type distributions, and gene body converage metrics, and more (see example below). These metrics are invaluable for **asseesing quantification accuracy** and **troubleshooting potential issues**.
49+
50+
![image-20230901163554962](docs/figures/qc_report_individual.png)
4451

45-
**Comprehensive quality control report**: For each sample, this pipeline generates a comprehensive quantlity control report, summarizing alignment statistics, quantification correlations, gene type distributions, and gene body converage metrics, and more. These metrics are invaluable for asseesing quantification accuracy and troubleshooting potential issues.
52+
### **3. Flater, Simpler, Faster**
4653

47-
**Flater, Simpler, Faste**r: Every step of the pipeline has been optimized for ease of use, maintenance and speed:
54+
Every step of the pipeline has been optimized for ease of use, maintenance and speed:
4855

49-
- All required tools now can be installed within one single conda environment.
56+
- **All required tools, databases and scripts now can be set up in a single conda environment.**
5057

51-
- Time-consuming steps, such as gene body coverage analysis, has been optimized. Now a typical run completes in about 2.5 hours.
58+
![Picture](./docs/figures/file_structure.png)
5259

53-
- There are only two arguments that the users need to specify manually. For all the rest, including the adapter sequences and strandness types, the pipeline can infer them automatically.
60+
- Time-consuming steps, such as gene body coverage analysis, has been optimized. **Now a typical run completes in about 2.5 hours**.
5461

55-
- All required from the user is a sample table (see example below). This make it effortless to process hundreds or thousands of samples using this pipeline.
62+
![Picture](./docs/figures/task_duration.png)
63+
64+
- **Only two parameters (Library Type and Phred Score Encoding Method) need to be specified manually**; all other settings, including adapter sequences and strandness, are automatically inferred.
65+
66+
- **The pipeline now is highly user-friendly for large-scale analyses**. Instead of relying on loops or other workarounds to process multiple samples, the pipeline now accepts **a sample table** (see example below) as standard input and automatically parse it and extract all required information. This makes it **effortless to process hundreds or thousands of samples**.
5667

5768
![Picture](./docs/figures/sampleTable_template.png)
5869

5970
## To Get Started
6071

61-
- If you are unable to access the conda environment below, or if you need a reference genome assembly other than the pre-built ones (**hg38**, **hg19**, **mm39**, **mm10**), you will need to set up your own pipeline first. For detailed instructions, please refer to this tutorial: [Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/index).
72+
We have set up a conda environment for this pipeline, with all **tools**, **databases** (for hg38, hg19, mm39 and mm10) and **scripts** ready to use. You can activate it using the following commands:
73+
74+
```bash
75+
module load conda3/202402 # conda version 24.1.2
76+
conda activate /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025
77+
```
78+
79+
- If you are unable to access the conda environment above, or if you need a reference genome assembly other than the pre-built ones (**hg38**, **hg19**, **mm39**, **mm10**), you will need to set up your own pipeline first. For detailed instructions, please refer to this tutorial: [Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/index).
6280

63-
```bash
64-
module load conda3/202402
65-
conda activate /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025
66-
```
81+
To run this pipeline,
6782

68-
- If you are new to bulk RNA-seq quantification analysis and would like to learn more about the pipeline in detail, please refer to this tutorial: [Full Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/3_full_tutorial/index).
83+
- **If you are new to bulk RNA-seq quantification analysis**, or would like to explore the pipeline in detail, please refer to this tutorial: [Full Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/3_full_tutorial/index).
6984

70-
- If you want to run this pipeline directly with your ownsamples, please refer to this tutorial: [Quick Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/2_quick_tutorial/quick_tutorial).
85+
- **If you are already familiar with this pipeline**, you can quickly run it with your own samples by following this tutorial: [Quick Tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/2_quick_tutorial/quick_tutorial).
7186

7287

7388

7489
## Contact
7590

76-
If you need support or have any questions about using this pipeline, please visit the [FAQ](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/4_FAQ/FAQ) or contact us directly at Qingfei.Pan@stjude.org.
91+
If you need support or have any questions about using this pipeline, please visit the **[FAQ](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/4_FAQ/FAQ)** or contact us directly at **Qingfei.Pan@stjude.org**.

0 commit comments

Comments
 (0)