You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/1_pipeline_setup/2_database.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ cd /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env
24
24
25
25
There are **FOUR** files required for database preparation. Three of them can be directly downloaded from online resources:
26
26
27
-
-**<u>*annotation.gtf*</u>**: Gene Annotation file in [GTF](https://biocorecrg.github.io/PhD_course_genomics_format_2021/gtf_format.html) (Gene Transfer Format) format.
27
+
-***<u>annotation.gtf</u>***: Gene Annotation file in [GTF](https://biocorecrg.github.io/PhD_course_genomics_format_2021/gtf_format.html) (Gene Transfer Format) format.
28
28
29
29
-***<u>transcriptome.fa</u>***: Transcriptome sequence file in [FASTA](https://www.ncbi.nlm.nih.gov/genbank/fastaformat/) format
30
30
@@ -164,7 +164,7 @@ cd /research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env
Copy file name to clipboardExpand all lines: docs/2_quick_tutorial/quick_tutorial.md
+47-29Lines changed: 47 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,36 +5,36 @@ nav_order: 3
5
5
permalink: /docs/2_quick_tutorial/quick_tutorial
6
6
---
7
7
8
-
### A quick tutorial to run this pipeline
8
+
#Quick Tutorial for Running the Pipeline
9
9
10
10
---
11
11
12
-
Welcome to the **quick tutorial**of bulk RNA-Seq quantification pipeline! This tutorial aims to provide **immediate and practical guidance**on running this pipeline with your own data. As a quick tutorial, this is mainly for the users who are familar with the basic concepts and tools of RNA-Seq quantification analysis. If you are new to this field, we highly recommend you to start from the **Full Tutorial** that provides more detailed documentation of this pipeline.
12
+
Welcome to the **Quick Tutorial**for the bulk RNA-seq quantification pipeline! This tutorial aims to provide **immediate and practical guidance**for running this pipeline with your own data. As a quick tutorial, it is intended for users who are already familar with the basic concepts and tools of RNA-seq quantification analysis. If you are new to this field, we highly recommend starting with the **Full Tutorial**, which provides comprehensive documentation and step-by-step guidance.
13
13
14
-
To get started, please run the command below to activate the conda environment for this pipeline:
14
+
To get started, activate the conda environment for this pipeline using the following commands:
If you have your own conda environment, change the commands accordingly. To set up a conda environment for this pipeline, pelase refer to [Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/pipeline_setup/index) tutorial.
21
+
If you are using a different conda environment, please change the path accordingly. To set up a conda environment for this pipeline, please refer to the[Pipeline Setup](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/pipeline_setup/index) tutorial.
22
22
23
-
####I. Prepare the sample table
23
+
## I. Prepare the sample table
24
24
25
25
Below is an example of the sample table for this pipeline:
26
26
27
-

27
+

28
28
29
29
It is a **tab-delimited text file with 6 columns**:
30
30
31
31
1)**<u>sampleID</u>**: name of samples. Some rules apply:
32
-
- Should contain **letters**, **numbers** or **underscores** only
33
-
- Should NOT **start with numbers**
32
+
- Should contain **letters**, **numbers** or **underscores** only;
33
+
- Should NOT **start with numbers**.
34
34
35
-
2.**<u>libraryType</u>**: type of libraries, `[PE | SE]`.
35
+
2.**<u>libraryType</u>**: type of libraries, paired-end or single-end, **`[PE | SE]`**.
36
36
37
-
-My input files are in BAM/SAM format, and I'm not sure about the library type of them? The command below can tell that:
37
+
-For BAM/SAM file inputs, not sure about their library type? You can use the command below to determine the library type:
38
38
39
39
```bash
40
40
## To tell the BAM/SAM files are single- or paired-end
@@ -43,11 +43,21 @@ Below is an example of the sample table for this pipeline:
43
43
# It returns 0 for single-end sequeing. Otherwise, the input bam/sam file is paired-end.
44
44
```
45
45
46
-
3. **<u>phredMethod</u>**: Phred quality score encoding method, `[Phred33 | Phred64]`. Not sure about the answer? These two rules can help tell that:
46
+
- For FASTQ file inputs, you typically have two paired files for**`PE`** data or one single file for **`SE`**data. An exception, though exceedingly rare, is an **interleaved** FASTQ file, where both **mate1** and **mate2** reads are combinedin a single file. ***This pipeline does not support interleaved FASTQ files as standard input.*** If you data is in this format, you will need to split it into two seperate FASTQ files before including them in your sample table.
47
47
48
-
- Phred64 was retired in late 2011. Data genrated after that should be in Phred33.
- We recommend to use `hg38`for human samples, and `mm39`for mouse samples. The assembly, `hg19` and `mm10`, are mainly used to match some legacy data.
63
-
- For other species or genome assembly, you will need to manually create the required reference files following [this tutorial](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/pipeline_setup/reference.html).
79
+
If you are using a different conda environment, please change the paths accordingly. To set up a reference genome database, please refer to the [Database Preparation](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/2_database.html) tutorial.
64
80
65
-
5. **<u>input</u>**: input files for quantification. This pipeline accepts:
81
+
- We recommend to use **`hg38`**for human samples, and **`mm39`**for mouse samples. The other assemblies, **`hg19`** and **`mm10`**, are mainly used to match some legacy data.
82
+
- If you can't acccess to the paths listed above, or if you require other genome assemblies, you will need to manually create the reference files by following the [Database Preparation](https://jyyulab.github.io/bulkRNAseq_quantification_pipeline/docs/1_pipeline_setup/2_database.html) tutorial.
66
83
67
-
- **<u>Standard FASTQ files</u>**: both paired-end (sample1) and single-end (sample2). Filenames should be ended with **`.fq`**, **`.fastq`**, **`.fq.gz`** or **`.fastq.gz`**.
84
+
5. **<u>input</u>**: input files for quantification. This pipeline accepts the following formats:
68
85
69
-
- **<u>FASTQ files of multple lanes</u>**: both paired-end (sample3) and single-end (sample4). Filenames should be ended with **`.bam`**or **`.sam`**.
86
+
- **<u>Standard FASTQ files</u>**: both paired-end (e.g., sample1) and single-end (e.g., sample2). Filenames must be ended with **`.fq`**, **`.fastq`**, **`.fq.gz`** or **`.fastq.gz`**.
70
87
71
-
- **<u>BAM/SAM files</u>**: single file only (sample 5 and sample6). For samples with multiple BAM/SAM files (usually splited ones), please merge them first:
88
+
- **<u>FASTQ files of multple lanes</u>**: both paired-end (e.g., sample3) and single-end (e.g., sample4). Filenames must be ended with **`.bam`** or **`.sam`**.
89
+
90
+
- **<u>BAM/SAM files</u>**: alignment files with filenames ended with **`.bam`** or **`.sam`** (e.g., sample 5, sample6). Only single files for each sample are accepted. If your sample consists of multiple BAM/SAM files (such as splited files), please merge them before proceeding:
6. **<u>output</u>**: path to save the output files. This pipeline will create a folder named by the `sampleID` under this directory.
96
+
6. **<u>output</u>**: the directory where output files will be saved. This pipeline will create a subfolder named by the **`sampleID`** within this directory.
78
97
79
98
80
99
81
100
Below are the two ways we recommend to generate the sample table:
82
101
83
102
* Any coding language you prefer, e.g. BASH, R, Python, Perl *et. al.*
84
-
* Excel or VIM. And foryou convince, we have a templete avaible on HPC: `/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/git_repo/testdata/sampleTable.testdata.txt`. You can simply copy it to your folder and edit itin VIM. (To insert TAB in VIM: on INSERT mode press control + v + TAB)
85
103
104
+
* **Excel** or **VIM**. And for you convince, we have a templete avaible here: `/research_jude/rgs01_jude/groups/yu3grp/projects/software_JY/yu3grp/conda_env/bulkRNAseq_2025/pipeline/testdata/sampleTable.testdata.txt`. You can simply copy it to your own folder and edit it using VIM. For those who can't access to the path, download the template file here.
86
105
106
+
87
107
88
-
### II. Data preprocessing
108
+
## II. Data preprocessing
89
109
90
110
The purpose of data preprocessing is to prepare the FASTQ files that can be directly used for down-stream quantification analysis. It contains two steps:
91
111
@@ -138,9 +158,7 @@ Typically, this step takes **~5 mins** to complete (for 150M PE-100 reads). The
138
158
139
159
After these two steps in data preprocessing, you should have the standard FASTQ files with clean sequences. They will be directly used in subsequent quantification analysis.
140
160
141
-
142
-
143
-
### III: Quantification
161
+
## III: Quantification
144
162
145
163
In this pipeline, we provide five quantification methods:
146
164
@@ -276,7 +294,7 @@ Typically, this step takes **~1 hrs** to complete (for 150M PE-100 reads). The s
0 commit comments