Serpent is an exploration into DNA and RNA sequences, nucleotide bases, codons, amino acids and genome data.
My motivation to start this project was that I have wanted to explore DNA data in order to to learn and maybe invent some compression algorithms for DNA data for about two decades.
Install serpent with pip install serpent
, or develop with pdm.
serpent cat
: concatenate and print FASTA filesserpent find
: find FASTA files in directoriesserpent find -s
: find and print FASTA sequences in files and directories
serpent encode
: Convert data into different encoded representationsserpent decode
: Map codons into numbers 0...64
serpent ac
: print and plot autocorrelation on DNA and RNA sequencesserpent fft
: plot FFTs on DNA and RNA sequencesserpent hist
: plot histogram statisticsserpent image
: visualise DNA and RNA data as imagesserpent seq
: plot sequence count statistics
serpent codons
: Print codon statisticsserpent pep
: Print peptide statistics
See serpent -h
for all subcommands and serpent <subcommand> -h
for options!
Get some sample data from NCBI datasets – I recommend starting with virus, bacteria or archea genomic data as they are smaller than plants or animals.
- National Center for Biotechnology Information
- Datasets - NCBI - NLM
- RefSeq: NCBI Reference Sequence Database
- Home - Nucleotide - NCBI
- Home - Protein - NCBI
- Genome - NCBI - NLM
A SARS-CoV-2 genome is only 29 kb for example!