SQL Schema

SQL Schema for Summary Reports

The following image shows the format of the summary reports that are generated upon each serratus run:

Summary report example

The SQL Schema for accessing each summary report is made up of four tables: Runs, FamilySections, AccessionSections, and FastaSections.

Runs

'Runs' corresponds to the first line of the summary file, where the data for the SRA, reference genome, and date is present. This table has a one to many relationship with the three following tables, all linked by the SRA and auto-generated PK RunId. The columns present on this table are as follows:

RunId: PK for the table, auto-generated when entered into the database
Sra: Sra for the analysis represented in the summary file
FileName: File name of the summary file in the S3 Bucket ex: "summary/DRR000231.summary"
Genome: Name of the reference genome used for analysis
Date: Date the summary file was generated

FamilySections

'FamilySections' corresponds to the next section of the summary report, where the data for the pan-genome is sorted by the number of hits. The columns present on this table are as follows:

FamilySectionId: PK for the table, auto-generated when entered into the database
FamilySectionLineId: Number indicating the position of the family line in the family section of the summary file
RunId: FK linking back to the Runs table
Sra: FK linking back to the Runs table, added here for easier query construction
Family: Name of the family of the pan-genome that is being analyzed
Score: Score given for the quality of the alignment
PctId: Percent identity of the sequences aligned (wrt the reference genome)
Aln: Number of aligned reads
Glb: Number of global aligned reads
PanLen: Pangenome length
Cvg: Coverage cartoon generated, giving a picture of the quality of alignment throughout the specific sequence
Top: Top accession
TopAln: Top accession aligned reads
TopLen: Length of the top accession
TopName: Study name linking to the top accession

AccessionSections

'AccessionSections' corresponds to the next section of the summary file, where each line is a reference sequence sorted by hits. The columns present on this table are as follows:

AccessionSectionId: PK for the table, auto-generated when entered into the database
AccessionSectionLineId: Number indicating the position of the accession line in the accession section of the summary file
RunId: FK linking back to the Runs table
Sra: FK linking back to the Runs table, added here for easier query construction
Fam: Name of the family of the pan-genome that is being analyzed
Acc: Name of the accession
PctId: Percent identity of the sequences aligned (wrt the reference genome)
Aln: Number of aligned reads
Glb: Number of global aligned reads
Len: Length of accession
CvgPct: Coverage percentage of mapping
Depth: Depth of the sequence
Cvg: Coverage cartoon generated, giving a picture of the quality of alignment throughout the specific sequence
Name: Study name linking to the accession

FastaSections

'FastaSections' corresponds to the final section of the summary report, where sequences with good scores assigned to cov and non-cov(Other) and are listed. The columns on this table are as follows:

FastaSectionId: PK for the table, auto-generated when entered into the database
FastaSectionLineId: Number indicating the position of the fasta line in the fasta section of the summary file
RunId: FK linking back to the Runs table
Sra: FK linking back to the Runs table, added here for easier query construction
SequenceId: Identifier of the shorthand for the fasta sequence in the fasta section (>Cov.2, >Other.12, etc.)
Sequence: Nucleotide sequence of the read

Overview

Architecture and Pipeline

Raw Data

Serratus Explorer (serratus.io)

Usage

Running Serratus
- Serratus-Lite, local
Finding Novel Viruses (tutorials)
Papers using Serratus
Containers
Summarizer usage
Cloud Budgeting
Serratus SQL Database Management
Data Policy

SQL Schema

SQL Schema for Summary Reports

Runs

FamilySections

AccessionSections

FastaSections

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!