Skip to content

07jikim/Densovirus_Novel-virus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

7 Commits
ย 
ย 

Repository files navigation

๐Ÿฆ  Densovirus_Novel-virus

This GitHub repository provides a bioinformatics pipeline for virome analysis and genome assembly of a novel densovirus identified in the two-spotted cricket (Gryllus bimaculatus).


๐Ÿ“‚ Overview

Virome analysis was conducted using individual Illumina paired-end read files, utilizing the Hecatomb-based (v1.3.2) pipeline (Roach et al., 2022) for initial data processing.

๐Ÿ”น Step 1: Quality Trimming and Host Filtering

  • BBTools (v38.90) was used to trim reads with:
    • Quality score < Q20
    • Length < 50 bp
  • This step also included contaminant and duplicate removal.
  • This step also included contaminant and duplicate removal.
bbduk.sh in1=sample_R1.fastq.gz in2=sample_R2.fastq.gz \
  out1=trimmed_R1.fastq.gz out2=trimmed_R2.fastq.gz \
  qtrim=r trimq=20 minlength=50 ref=adapters.fa ktrim=r k=23 mink=11 hdist=1

minimap2 -ax sr host_genome.fa trimmed_R1.fastq.gz trimmed_R2.fastq.gz | \
  samtools view -b -f 12 -F 256 - > host_filtered.bam


### ๐Ÿ”น Step 2: Clustering and Taxonomic Annotation

- Clustered reads were annotated using **MMSeqs2** with the following databases:
  - NCBI viral genome database  
  - Multi-kingdom nucleotide and protein databases derived from NCBI RefSeq (bacterial, archaeal, viral)
- This step allowed the identification of the dominant virus, *Blattodean blattambidensovirus 1*, from the read clusters.


```bash
mmseqs createdb clustered_reads.fasta clustered_reads_DB
mmseqs search clustered_reads_DB viral_refseq_DB result tmp --threads 8
mmseqs convertalis clustered_reads_DB viral_refseq_DB result result.m8

๐Ÿ”น Step 3: Viral Assembly
Reads annotated to Blattodean blattambidensovirus 1 (Blattambidensovirus) were assembled using MEGAHIT (v1.2.9).

```bash
megahit -1 virus_reads_R1.fastq.gz -2 virus_reads_R2.fastq.gz -o megahit_output

๐Ÿ”น Step 4: Reference-based Mapping
The presence of the GbDV genome in each sample was confirmed using Bowtie2 (v2.4.1).

bowtie2-build GbDV_reference.fasta GbDV_index
bowtie2 -x GbDV_index -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -S mapped.sam

About

Bioinformatics pipeline for Densovirus in the cricekt (Gryllus bimaculatus)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published