SQL Schema

Overview
Connection Details
SQL Schema - Nucleotide

Overview

Serratus SQL Database is an ordered and accessible way to access the entire collection of Viral Summary Reports) in the project along with additional metadata.

There are two primary ways to access the SQL server:

Tantalus, the R-package companion to Serratus
pgadmin4 or your favorite PostgreSQL program

Connection Details

Endpoint: serratus-aurora-20210406.cluster-ro-ccz9y6yshbls.us-east-1.rds.amazonaws.com
Database: summary
Username: public_reader
Password: serratus

SQL Schema - Nucleotide

`nsra`

nsra is the first line of the summary file, where the data for the SRA, reference genome, and date is present. This table has a one-to-many relationship with the other tables, all linked by the SRA accession run_id.

Columns:

run_id: SRA run accession for the analysis represented in the summary file
read_length: ?
genome: Name of the reference genome used for analysis
date: Date the summary file was generated

`nfamily`

nfamily corresponds to per-viral family pangenome. Reads from all virus accessions (nsequence) are pooled to increase sensitivity (especially for divergent viruses). Detecting a virus of interest is best done initially on the family level, followed by a specific search within matching entries.

Columns:

run_id: SRA run accession for the analysis represented in the summary file
family_name: Name of the family of the pan-genome that is being analyzed
coverage_bins: Coverage cartoon generated, giving a picture of the quality of alignment throughout the specific sequence
score: Score given for the quality of the alignment
percent_identity: Percent identity of the sequences aligned (wrt the reference genome)
depth: Depth of the sequence
n_reads: Number of aligned reads
n_global_reads: Number of global aligned reads
length: Pangenome length
top_genbank_id: Top accession
top_score: Top accession score
top_length: Length of the top accession
top_name: Study name linking to the top accession

`nsequence`

nsequence corresponds to lowest level of sequence-matches. Each reference sequence with mapped reads is an entry.

Columns:

run_id: SRA run accession for the analysis represented in the summary file
family_name: Name of the family of the pan-genome that is being analyzed
sequence_accession: Name of the accession
coverage_bins: Coverage cartoon generated, giving a picture of the quality of alignment throughout the specific sequence
score: Score given for the quality of the alignment
percent_identity: Percent identity of the sequences aligned (wrt the reference genome)
depth: Depth of the sequence
n_reads: Number of aligned reads
n_global_reads: Number of global aligned reads (excludes soft-clipped reads)
length: Length of accession
virus_name: Study name linking to the accession

Overview

Architecture and Pipeline

Raw Data

Serratus Explorer (serratus.io)

Usage

Running Serratus
- Serratus-Lite, local
Finding Novel Viruses (tutorials)
Papers using Serratus
Containers
Summarizer usage
Cloud Budgeting
Serratus SQL Database Management
Data Policy

SQL Schema

Overview

Connection Details

SQL Schema - Nucleotide

nsra

nfamily

nsequence

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`nsra`

`nfamily`

`nsequence`