This repository contains my first analysis workflow using Qiita (https://qiita.ucsd.edu/) and Qiime 2 (https://qiime2.org/). It processes and analyses 16s rRNA data from my raw reads for diversity analysis and taxonomy
Notes π§ββοΈ : -Taxonomy was assigned using the SILVA 138 classifier -Alpha/beta diversity used filtered OTU tables -Sample metadata is stored in metadata.tsv
Qiita workflow summary
βββ1. Raw Data Upload
βββ2.Split libraries
βββ3. Demultiplexed
βββ4.Pick closed-reference OTUs (QIIMEq2 1.9.1)
βββ5.OTU Table Summary (Feature Table)
βββ6. Output Files
1. Raw Data Upload
- Platform: Illumina
2. Split libraries
- Tool: Split libraries FASTQ (QIIME 1.9.1) - To demultiplex raw FASTQ reads and apply initial quality filtering.
- Key settings:
Removed reads with ambiguous bases (Ns)
Kept reads with at least 75% of the original length
Allowed up to 3 consecutive low-quality bases
Quality threshold: Q3
Barcode type: Not-barcoded (reads were already separated per sample)
3. Demultiplexed
- Total: 1455219
- Max: 301
- Mean: 301
- Standard deviation: 301
4. Pick closed-reference OTUs (QIIMEq2 1.9.1)
- Classifier: silva_119_taxonomy_97_7
- Similarity threshold: 97% (only sequences matching reference OTUs at β₯97% similarity were retained)
- Other settings:
sortmerna_e_value: 1
sortmerna_max_pos: 10,000
sortmerna_coverage: 0.97
Threads used: 5
5.OTU Table Summary (Feature Table)
6. Output Files
- Feature table (BIOM)
- Feature table (QZA)
- Sortmerna picked Otus
- support files
- seqs.fasta.gz
Qiime 2 pipeline\
βββ scripts/ # Ubuntu WSL
βββ metadata.tsv # Sample metadata
βββ README.md
Steps in Pipeline π§
- Import raw data
- OTU picking
- Filtering chloroplast/mitochondria
- Assign taxonomy (SILVA)
- Collapse feature table
- Alpha diversity (Shannon)
- Beta diversity (UniFrac, Bray-Curtis, pcoa)
- Visualisations (barplots, diversity plots, pcoa)
activation of the environmentconda activate qiime2-amplicon-2024.5
importing the biom and creating a qiime 2 artifact - qza
qiime tools import \
--input-path your_depository/216124_otu_table.biom \
--type 'FeatureTable[Frequency]' \
--input-format BIOMV210Format \
--output-path your_depository/216124_otu-feature-table.qzaExtracts the contents of my SortMeRNA OTU results
Check what is in sortmerna_picked_otus.tgz
tar -xvzf your_depository/216124_sortmerna_picked_otus.tgz -C your_depository_test/`unzip seq.fasta.gz
gunzip -c your_depository/216111_seqs.fasta.gz > your_depository/216111_seqs.fastaGenerating a representative sequence Fasta file
A full set of all sequences (in FASTA format)
An OTU clustering file (text file from SortMeRNA)
seq.fasta has to match with seqs_otus
This is a code using a Python package to create a rep_set.fna, which contains one representative sequence per OTU.
pip install biopython biopython package (needed to read/write FASTA files using SeqIO) Now, create a script for Python
nano generate_rep_set.py
insert into nano
from Bio import SeqIOFile paths
seqs_fasta = "your_depository/216111_seqs.fasta"
otus_txt = "your_depository/sortmerna_picked_otus/seqs_otus.txt"
output_fasta = "your_depository/rep_set.fna"Load sequence records
seq_records = SeqIO.to_dict(SeqIO.parse(seqs_fasta, "fasta"))Parse the OTU file and write the representative sequences
with open(otus_txt, "r") as otus_file, open(output_fasta, "w") as out_fasta:
for line in otus_file:
columns = line.strip().split()
otu_id = columns[0] # OTU ID
sequence_ids = columns[1:] # Sequences in this OTU
for seq_id in sequence_ids:
if seq_id in seq_records:
# Use the first sequence as the representative
rep_seq = seq_records[seq_id]
rep_seq.id = otu_id # Rename header to OTU ID
rep_seq.description = "" # Clear description
SeqIO.write(rep_seq, out_fasta, "fasta")
breakRun the Python script
python3 generate_rep_set.pyconverting rep_set.fna into an artifact so I can work with it in qiime
qiime tools import \
--input-path your_depository/rep_set.fna \
--output-path your_depository/otu-rep-seqs.qza \
--type 'FeatureData[Sequence]'now assigning taxonomy using SILVA -> silva-138-99-nb-classifier.qza
qiime feature-classifier classify-sklearn \
--i-classifier your_path/silva-138-99-nb-classifier.qza \ # representative sequences for each OTUs!
--i-reads your_depository/otu-rep-seqs.qza \
--o-classification your_depository/otu-taxonomy.qza # Output used in RStudio analysisfiltering Mitochondria, Chloroplast, Samples with less than 5 reads (The reads depend on your choice and the data)
qiime taxa filter-table \
--i-table your_depository/216124_otu-feature-table.qza \
--i-taxonomy your_depository/otu-taxonomy.qza \
--p-exclude mitochondria,chloroplast \
--o-filtered-table your_depository/otu-table-no-mitochondria-chloroplast.qzaless than 5 reads
qiime feature-table filter-features \
--i-table your_depository/otu-table-no-mitochondria-chloroplast.qza \
--p-min-frequency 5 \
--o-filtered-table your_depository/otu-table-filtered.qzaVisualisation of the taxonomy
qiime metadata tabulate \
--m-input-file your_depository/otu-table-filtered.qza \
--o-visualization your_depository/otu-taxonomy.qzvsummary
qiime feature-table summarize \
--i-table your_depository/otu-table-filtered.qza \
--o-visualization your_depository/otu-table-summary.qzvstill checks after the filtering and converts to tsv, it is just for checking! less than 5 reads
qiime tools export \
--input-path your_depository/otu-table-filtered.qza \
--output-path your_depository/otu-table-filtered-exported
biom convert \
-i your_depository_test/otu-table-filtered-exported/feature-table.biom \
-o your_depository_test/otu-table-filtered.tsv \
--to-tsvExtract OTU table (from qza to Bioam)
qiime tools export \
--input-path your_depository/otu-table-filtered.qza \
--output-path your_depository/exported_taxonomyNow I need to assign taxonomy
qiime tools export \
--input-path your_depository/otu-taxonomy.qza \
--output-path your_depository/exported_taxonomyAdding taxonomy to each OTU in the biom table
biom add-metadata \
-i your_depository_test/exported_otu_genus_table/feature-table.biom \
-o your_depository_test/feature-table-with-taxonomy.biom \
--observation-metadata-fp your_depository_test/exported_taxonomy/taxonomy.tsv \
--sc-separated taxonomyThe header must follow the rules
sed -i '1s/.*/#OTUID\ttaxonomy\tconfidence/' your_depository_test/exported_taxonomy/taxonomy.tsvConverting the Biom table into TSV
biom convert \
-i your_depository_test/feature-table-with-taxonomy.biom \
-o your_depository_test/feature-table-with-taxonomy.tsv \
--to-tsv \
--header-key taxonomyCollapsing - Even though Iβve added taxonomy into the feature table file, itβs still better to let QIIME 2 handle the taxonomy-based grouping using Qiime taxa collapse, because it understands the taxonomic hierarchy and does the grouping properly; Genus-level insights are more meaningful than raw OTUs.
qiime taxa collapse \
--i-table your_depository_test/otu-table-filtered.qza \
--i-taxonomy your_depository_test/otu-taxonomy.qza \
--p-level 6 \
--o-collapsed-table your_depository_test/otu-genus-feature-table.qza # Output used in RStudio analysisExport to table
qiime tools export \
--input-path your_depository_test/otu-genus-feature-table.qza \
--output-path your_depository_test/exported_otu_genus_tableAnd to tsv
biom convert \
-i your_depository_test/exported_otu_genus_table/feature-table.biom \
-o your_depository_test/otu-genus-feature-table.tsv \
--to-tsv \
--table-type="OTU table"Blast and long formatting of the table is done in R Studio
Get rid of the prefix
sed -i 's/15931\.//g' your_depository_test/otu-table-filtered.tsvNow the qza needs to be fixed by the biom
biom convert \
-i your_depository_test/otu-table-filtered.tsv \
-o your_depository_test/otu-table-filtered.biom \
--to-hdf5 \
--table-type="OTU table"and back to qza
qiime tools import \
--input-path your_depository_test/otu-table-filtered.biom \
--type 'FeatureTable[Frequency]' \
--output-path your_depository/otu-table-filtered.qza # Output used in RStudio analysisqiime diversity alpha \
--i-table your_depository_test/otu-table-filtered.qza \
--p-metric shannon \
--o-alpha-diversity your_depository_test/shannon_alpha_diversity.qza # Output used in RStudio analysisVisualisation
qiime metadata tabulate \
--m-input-file your_depository_test/shannon_alpha_diversity.qza \
--o-visualization your_depository_test/shannon_alpha_diversity.qzvAlpha diversity with metadata containing duration_exposure (or whatever you need) My IDs have a prefix from Qiita It is easier to add a prefix to the metadata
Adding the prefix to metadata
awk -F'\t' 'BEGIN {OFS="\t"} NR==1 {$1=$1} NR>1 {$1="15931."$1} 1' \Creating TSV
your_depository/metadata_withcombined.tsv \> your_depository/metadata_prefixed.tsv # We will use this metadata in the R studio as wellBIOM to QZA
qiime tools import \
--input-path your_depository_test/otu-table-filtred.biom \
--type 'FeatureTable[Frequency]' \
--output-path your_depository_test/otu-table-filtered-fixed.qzaqiime diversity alpha \
--i-table your_depository_test/otu-table-filtered.qza \
--p-metric shannon \
--o-alpha-diversity your_depository_test/shannon_alpha_diversity.qzaVisualisation
qiime diversity alpha-group-significance \
--i-alpha-diversity your_depository/shannon_alpha_diversity.qza \
--m-metadata-file your_depository/metadata_prefixed.tsv \
--o-visualization your_depository/shannon-group-significance.qzvAligns OTU sequences, removes noisy alignment regions, builds a tree, and roots it β use it in QIIME to calculate phylogenetic diversity metrics (UniFrac)
qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences your_depository_test/otu-rep-seqs.qza \
--o-alignment your_depository_test/aligned-rep-seqs.qza \
--o-masked-alignment your_depository_test/masked-alignment.qza \
--o-tree your_depository_test/unrooted-tree.qza \
--o-rooted-tree your_depository_test/rooted-tree.qzaUnweighted Unifrac -> I want presence and absence information
qiime diversity beta-phylogenetic \
--i-table your_depository/otu-table-filtered.qza \
--i-phylogeny your_depository/rooted-tree.qza \
--p-metric unweighted_unifrac \
--o-distance-matrix your_depository/unweighted_unifrac_distance_matrix.qzaMeasuring how different microbial communities are between samples
Based on the presence or absence of taxa, and how evolutionarily distinct those taxa are
Extract sample IDs from the OTU table
head -n 1 your_depository_test/otu-table-filtered.tsv | cut -f2- > your_depository_test/sample-ids.txtReformat the sample ID
grep to filter another file to keep only lines that match those sample names
(head -n 1 your_depository/metadata.tsv && \
grep -F -w -f your_depository_test/sample-ids-fixed.txt \
your_depository/metadata.tsv) \
> your_depository_test/metadata-matched.tsvNeeds to extract the real headers
sed -n '2p' your_depository/otu-table-filtered.tsv | cut -f2- \
| tr '\t' '\n' > your_depository_test/sample-ids-fixed.txtand check it
cat your_depository_test/sample-ids-fixed.txtNow convert to biom
biom convert \
-i your_depository_test/otu-table-filtered.tsv \
-o your_depository_test/otu-table-fixed.biom \
--table-type="OTU table" \
--to-hdf5and convert biom to qza so I can work with it in qiime
qiime tools import \
--input-path your_depository_test/otu-table-fixed.biom \
--type 'FeatureTable[Frequency]' \
--input-format BIOMV210Format \
--output-path your_depository_test/otu-table-fixed.qzaqiime taxa barplot \
--i-table your_depository/otu-table-fixed.qza \
--i-taxonomy your_depository/otu-taxonomy.qza \
--m-metadata-file your_depository/metadata.tsv \
--o-visualization your_depository/taxa-barplot.qzvqiime diversity beta \
--i-table your_depository/otu-table-fixed.qza \
--p-metric braycurtis \
--o-distance-matrix your_depository/braycurtis-distance.qza qiime diversity pcoa \
--i-distance-matrix your_depository_test/braycurtis-distance.qza \
--o-pcoa your_depository_test/braycurtis-pcoa.qza # Output used in RStudio analysisqiime emperor plot \
--i-pcoa your_depository/braycurtis-pcoa.qza \
--m-metadata-file your_depository/metadata.tsv \
--o-visualization your_depository/braycurtis-pcoa.qzvπ»ππΌ
Metadata sample_name description sample_type condition sex day feeding exposure condition_duration 15931.GD7A1ChlControl7 mussel experimental Chl Control f day7 chlorella no Chl Control 7 15931.GD7A2ChlControl7 mussel experimental Chl Control m day7 chlorella no Chl Control 7 15931.GD7A3ChlControl7 mussel experimental Chl Control m day7 chlorella no Chl Control 7
Ω©(^α^)Ω Happy coding