nf-core-wgcnamodules is a bioinformatics pipeline complementary to nf-core/rnaseq
and TDTHub
that can be used to infer relevant TF regulators in 60 different plant species from RNA-seq data. It takes a samplesheet and the salmon folder from a RNA-seq quantification analisys with nf-core/rnaseq
, performs a diferential expression analisys, WGCNA, and generates clusters of co-expressed genes to search for enriched TFBS using TDTHub
.
- RNA-seq quantification using nf-core pipeline (
nf-core/rnaseq
). - Differential Expression Genes filter (
DESeq2
). - WGCNA (
WGCNA
). - TDTHub (
TDTHub
).
NOTE Example files to get familiar with and test the pipeline are available at wgcnamodules_testdata. We recommend to test pipeline using these files when you run it for the first time.
NOTE Parameters configuration and extensive details are available in the documentation and in the the book chapter associated to this pipeline
First, run (nf-core/rnaseq
) and prepare a samplesheet with your input data that looks as follows:
samplesheet_wgcna.csv
:
sample,condition,replicate
CONTROL1_REP1,CONTROL1,1
CONTROL1_REP2,CONTROL1,2
TREATMENT1_REP1,TREATMENT1,1
TREATMENT1_REP2,TREATMENT1,2
TREATMENT2_REP1,TREATMENT2,1
TREATMENT2_REP2,TREATMENT2,2
Where the columns correspond to:
- ‘sample’: same name as the 'samplesheet_rnaseq.csv' sample column.
- ‘condition’: name of the treatment, genotype or group that defines an experimental condition with one or multiple replicates.
- ‘replicate’: number of the biological replicate.
Prepare a metadata file with the following format:
contrast_wgcna.csv
:
contrast,variable,control,target
TREATMENT1_vs_CONTROL1,condition,CONTROL1,TREATMENT1
TREATMENT2_vs_CONTROL1,condition,CONTROL1,TREATMENT2
Where the columns correspond to:
- ‘contrast’: a custom name used to identify the contrast.
- ‘variable': the name of the column from 'samplesheet_wgcna.csv' file that contains the condition ids.
- 'control': the base/reference level for the contrast.
- 'target': the target/ non-reference level for the comparison.
Now, you can run the pipeline using:
nextflow run nf-core-wgcnamodules \
-profile conda \
--input samplesheet_wgcna.csv \
--contrast contrasts_wgcna.csv \
--salmon_dir <PATH_TO_NF-CORE/RNASEQ_SALMON_FOLDER>/salmon \
--diff_exp_genes true \
--outdir <OUTDIR>
For more details and further functionality, please refer to the usage documentation and the book chapter.
For more details about the output files and reports, please refer to the and the output documentation.
nf-core-wgcnamodules was originally written by roldanjg.
If you use nf-core/wgcnamodules for your analysis, please cite it using the following doi:
XXXXXXXXXXXXX
Grau J. & Franco-Zorrilla JM.
XXXXX 2024 X X. doi: XXXX.
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md
file.
You can cite the nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
If you would like to contribute to this pipeline, please see the contributing guidelines.
NOTE This pipeline was created to run complementary to
nf-core/rnaseq
and is not an official release from the nf-core team, but is intended to subscribe to the standards, practices and procedures established by nf-core community.