Skip to content

Commit 008821b

Browse files
committed
update quick tutorial
1 parent 6b81272 commit 008821b

5 files changed

Lines changed: 56 additions & 31 deletions

File tree

.DS_Store

0 Bytes
Binary file not shown.

docs/1_pipeline_setup/2_database.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ cd /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env
2424

2525
There are **FOUR** files required for database preparation. Three of them can be directly downloaded from online resources:
2626

27-
- **<u>*annotation.gtf*</u>**: Gene Annotation file in [GTF](https://biocorecrg.github.io/PhD_course_genomics_format_2021/gtf_format.html) (Gene Transfer Format) format.
27+
- ***<u>annotation.gtf</u>***: Gene Annotation file in [GTF](https://biocorecrg.github.io/PhD_course_genomics_format_2021/gtf_format.html) (Gene Transfer Format) format.
2828

2929
- ***<u>transcriptome.fa</u>***: Transcriptome sequence file in [FASTA](https://www.ncbi.nlm.nih.gov/genbank/fastaformat/) format
3030

@@ -164,7 +164,7 @@ cd /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env
164164
cat $dir_database/transcripts.fa $dir_database/genome.fa > $dir_database/bulkRNAseq/Salmon/gentrome.fa
165165
166166
# Salmon indexing
167-
salmon index -t $dir_database/bulkRNAseq/Salmon/gentrome.fa -d $dir_database/bulkRNAseq/Salmon/decoys.txt -p 8 -i index_decoy --gencode -k 31
167+
$dir_bin/salmon index -t $dir_database/bulkRNAseq/Salmon/gentrome.fa -d $dir_database/bulkRNAseq/Salmon/decoys.txt -p 8 -i index_decoy --gencode -k 31
168168
```
169169
170170
6. **Create genome index files for STAR**

docs/2_quick_tutorial/quick_tutorial.md

Lines changed: 47 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -5,36 +5,36 @@ nav_order: 3
55
permalink: /docs/2_quick_tutorial/quick_tutorial
66
---
77

8-
### A quick tutorial to run this pipeline
8+
# Quick Tutorial for Running the Pipeline
99

1010
---
1111

12-
Welcome to the **quick tutorial** of bulk RNA-Seq quantification pipeline! This tutorial aims to provide **immediate and practical guidance** on running this pipeline with your own data. As a quick tutorial, this is mainly for the users who are familar with the basic concepts and tools of RNA-Seq quantification analysis. If you are new to this field, we highly recommend you to start from the **Full Tutorial** that provides more detailed documentation of this pipeline.
12+
Welcome to the **Quick Tutorial** for the bulk RNA-seq quantification pipeline! This tutorial aims to provide **immediate and practical guidance** for running this pipeline with your own data. As a quick tutorial, it is intended for users who are already familar with the basic concepts and tools of RNA-seq quantification analysis. If you are new to this field, we highly recommend starting with the **Full Tutorial**, which provides comprehensive documentation and step-by-step guidance.
1313

14-
To get started, please run the command below to activate the conda environment for this pipeline:
14+
To get started, activate the conda environment for this pipeline using the following commands:
1515

1616
``` bash
17-
module load conda3/202402
17+
module load conda3/202402 # conda version: 24.1.2
1818
conda activate /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025
1919
```
2020

21-
If you have your own conda environment, change the commands accordingly. To set up a conda environment for this pipeline, pelase refer to [Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/pipeline_setup/index) tutorial.
21+
If you are using a different conda environment, please change the path accordingly. To set up a conda environment for this pipeline, please refer to the [Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/pipeline_setup/index) tutorial.
2222

23-
#### I. Prepare the sample table
23+
## I. Prepare the sample table
2424

2525
Below is an example of the sample table for this pipeline:
2626

27-
![image](../_figures/sampleTable_template.png)
27+
![image](../figures/sampleTable_template.png)
2828

2929
It is a **tab-delimited text file with 6 columns**:
3030

3131
1) **<u>sampleID</u>**: name of samples. Some rules apply:
32-
- Should contain **letters**, **numbers** or **underscores** only
33-
- Should NOT **start with numbers**
32+
- Should contain **letters**, **numbers** or **underscores** only;
33+
- Should NOT **start with numbers**.
3434

35-
2. **<u>libraryType</u>**: type of libraries, `[PE | SE]`.
35+
2. **<u>libraryType</u>**: type of libraries, paired-end or single-end, **`[PE | SE]`**.
3636

37-
- My input files are in BAM/SAM format, and I'm not sure about the library type of them? The command below can tell that:
37+
- For BAM/SAM file inputs, not sure about their library type? You can use the command below to determine the library type:
3838

3939
``` bash
4040
## To tell the BAM/SAM files are single- or paired-end
@@ -43,11 +43,21 @@ Below is an example of the sample table for this pipeline:
4343
# It returns 0 for single-end sequeing. Otherwise, the input bam/sam file is paired-end.
4444
```
4545

46-
3. **<u>phredMethod</u>**: Phred quality score encoding method, `[Phred33 | Phred64]`. Not sure about the answer? These two rules can help tell that:
46+
- For FASTQ file inputs, you typically have two paired files for **`PE`** data or one single file for **`SE`**data. An exception, though exceedingly rare, is an **interleaved** FASTQ file, where both **mate1** and **mate2** reads are combined in a single file. ***This pipeline does not support interleaved FASTQ files as standard input.*** If you data is in this format, you will need to split it into two seperate FASTQ files before including them in your sample table.
4747

48-
- Phred64 was retired in late 2011. Data genrated after that should be in Phred33.
48+
```bash
49+
## To split an interleaved FASTQ file
50+
fastp --interleaved_in --in1 interleaved.fq --out1 fqRaw_R1.fq.gz --out2 fqRaw.R2.fq.gz
51+
# Then, use 'PE' as the library type and use 'fqRaw_R1.fq.gz' and 'fqRaw_R2.fq.gz' as the input in your sample table
52+
```
53+
54+
3. **<u>phredMethod</u>**: Phred quality score encoding method, **`[Phred33 | Phred64]`**.
55+
56+
Not sure about the answer? These two guidelines can help you determine the correct Phred encoding method:
4957

50-
- Use the FastQC to tell that:
58+
- **Phred64** was retired in late 2011. Data genrated after that time should use **Phred33**.
59+
60+
- You can use ***FastQC*** to identify the Phred encoding of your input files:
5161

5262
``` bash
5363
## To tell the Phred quality score encoding method in FASTQ/BAM/SAM files
@@ -57,35 +67,45 @@ Below is an example of the sample table for this pipeline:
5767
# "Sanger / Illumina 1.9" indicates Phred33, while "Illumina 1.5 or lower" indicates Phred64.
5868
```
5969

60-
4. **<u>reference</u>**: reference genome assembly, `[hg38 | hg19 | mm39 | mm10]`.
70+
4. **<u>reference</u>**: path to reference genome database folder. There are pre-built datasets for four reference genome assemblies:
71+
72+
| Genome Assembly | Path |
73+
| --------------- | ------------------------------------------------------------ |
74+
| hg38/GRCh38.p14 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/hg38/gencode.release48 |
75+
| hg19/GRCh37.p13 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/hg19/gencode.release48lift37 |
76+
| mm39/GRCm39 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/mm39/gencode.releaseM37 |
77+
| mm10/GRCm38.p6 | /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/mm10/gencode.releaseM25 |
6178

62-
- We recommend to use `hg38` for human samples, and `mm39` for mouse samples. The assembly, `hg19` and `mm10`, are mainly used to match some legacy data.
63-
- For other species or genome assembly, you will need to manually create the required reference files following [this tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/pipeline_setup/reference.html).
79+
If you are using a different conda environment, please change the paths accordingly. To set up a reference genome database, please refer to the [Database Preparation](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/2_database.html) tutorial.
6480

65-
5. **<u>input</u>**: input files for quantification. This pipeline accepts:
81+
- We recommend to use **`hg38`** for human samples, and **`mm39`** for mouse samples. The other assemblies, **`hg19`** and **`mm10`**, are mainly used to match some legacy data.
82+
- If you can't acccess to the paths listed above, or if you require other genome assemblies, you will need to manually create the reference files by following the [Database Preparation](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/2_database.html) tutorial.
6683
67-
- **<u>Standard FASTQ files</u>**: both paired-end (sample1) and single-end (sample2). Filenames should be ended with **`.fq`**, **`.fastq`**, **`.fq.gz`** or **`.fastq.gz`**.
84+
5. **<u>input</u>**: input files for quantification. This pipeline accepts the following formats:
6885
69-
- **<u>FASTQ files of multple lanes</u>**: both paired-end (sample3) and single-end (sample4). Filenames should be ended with **`.bam`** or **`.sam`**.
86+
- **<u>Standard FASTQ files</u>**: both paired-end (e.g., sample1) and single-end (e.g., sample2). Filenames must be ended with **`.fq`**, **`.fastq`**, **`.fq.gz`** or **`.fastq.gz`**.
7087
71-
- **<u>BAM/SAM files</u>**: single file only (sample 5 and sample6). For samples with multiple BAM/SAM files (usually splited ones), please merge them first:
88+
- **<u>FASTQ files of multple lanes</u>**: both paired-end (e.g., sample3) and single-end (e.g., sample4). Filenames must be ended with **`.bam`** or **`.sam`**.
89+
90+
- **<u>BAM/SAM files</u>**: alignment files with filenames ended with **`.bam`** or **`.sam`** (e.g., sample 5, sample6). Only single files for each sample are accepted. If your sample consists of multiple BAM/SAM files (such as splited files), please merge them before proceeding:
7291
7392
``` bash
74-
samtools merge -o output.bam input_1.bam input_2.bam ...
93+
samtools merge -o merged.bam input_1.bam input_2.bam ...
7594
```
7695
77-
6. **<u>output</u>**: path to save the output files. This pipeline will create a folder named by the `sampleID` under this directory.
96+
6. **<u>output</u>**: the directory where output files will be saved. This pipeline will create a subfolder named by the **`sampleID`** within this directory.
7897
7998
8099
81100
Below are the two ways we recommend to generate the sample table:
82101
83102
* Any coding language you prefer, e.g. BASH, R, Python, Perl *et. al.*
84-
* Excel or VIM. And for you convince, we have a templete avaible on HPC: `/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/git_repo/testdata/sampleTable.testdata.txt`. You can simply copy it to your folder and edit it in VIM. (To insert TAB in VIM: on INSERT mode press control + v + TAB)
85103
104+
* **Excel** or **VIM**. And for you convince, we have a templete avaible here: `/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sampleTable.testdata.txt`. You can simply copy it to your own folder and edit it using VIM. For those who can't access to the path, download the template file here.
86105

106+
87107

88-
### II. Data preprocessing
108+
## II. Data preprocessing
89109

90110
The purpose of data preprocessing is to prepare the FASTQ files that can be directly used for down-stream quantification analysis. It contains two steps:
91111

@@ -138,9 +158,7 @@ Typically, this step takes **~5 mins** to complete (for 150M PE-100 reads). The
138158

139159
After these two steps in data preprocessing, you should have the standard FASTQ files with clean sequences. They will be directly used in subsequent quantification analysis.
140160

141-
142-
143-
### III: Quantification
161+
## III: Quantification
144162

145163
In this pipeline, we provide five quantification methods:
146164

@@ -276,7 +294,7 @@ Typically, this step takes **~1 hrs** to complete (for 150M PE-100 reads). The s
276294
- Reads alignment results: **`sampleID/quantSTAR_HTSeq/Aligned.out.bam`** for genome alignments, and**`sampleID/quantSTAR/Aligned.toTranscriptome.out.bam`** for transcriptome alignments.
277295
- some other files/folders
278296

279-
### IV: Summarization
297+
## IV: Summarization
280298

281299
There are two purposes in the Summarization analysis:
282300

scripts/.DS_Store

0 Bytes
Binary file not shown.

testdata/sampleTable.testdata.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
sampleID libraryType phredMethod reference input output
2+
sample1 PE Phred33 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/hg38/gencode.release48 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample1/fqRaw_R1.fq.gz;/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample1/fqRaw_R2.fq.gz /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/Quantification
3+
sample2 SE Phred33 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/hg38/gencode.release48 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample2/fqRaw.fq.gz /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/Quantification
4+
sample3 PE Phred33 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/hg19/gencode.release48lift37 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R1_L001.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R1_L002.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R1_L003.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R1_L004.fq.gz;/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R2_L001.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R2_L002.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R2_L003.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample3/fqRaw_R2_L004.fq.gz /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/Quantification
5+
sample4 SE Phred33 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/hg19/gencode.release48lift37 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample4/fqRaw_L001.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample4/fqRaw_L002.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample4/fqRaw_L003.fq.gz,/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample4/fqRaw_L004.fq.gz /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/Quantification
6+
sample5 PE Phred33 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/hg38/gencode.release48 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample5/rawBAM.toGenome.bam /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/Quantification
7+
sample6 SE Phred33 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/databases/mm10/gencode.releaseM25 /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sample6/rawBAM.toTranscriptome.bam /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/Quantification

0 commit comments

Comments
 (0)