sequence-counter

Heavily commented, quick script for processing sequence files for length and basepair composition

This is a pure bash script for taking a sequnce file and determining its base content, total length both gapped and ungapped. The purpose of the script is to be a portable educational tool for people just learning bash scripting (such as myself)

It's not the best optimized script by any stretch of imagination, but it's simple enough that all its components should be useful for any amateur researcher looking for simple, practical code examples.

For reference, on a netbook with Celeron N3060 processor and 4GB of ram running Lubuntu 20.04 Human chromosome 1 GRCh38.p13 (about 240.8 MB file) NC_000001.11 takes below time from start to finish.

real 3m46.775s user 3m23.530s sys 0m16.992s

Agrobacterium tumefaciens strain GCF_900045375.1 takes below time from start to finish

real 0m5.423s user 0m4.901s sys 0m0.467s

The repo contains a 100bp positive control fasta file generated by a DNA synthesis script from https://github.com/naturepoker/dna-synth Running below code should output the following.

./seq_counter.sh control_100bp.fasta


##################################################
Processing control_100bp.fasta
##################################################
                                                  
                                                  
                                                  
##################################################
     Total sequence composition is as follows     
--------------------------------------------------
     18 A
     27 T
     29 C
     26 G
--------------------------------------------------
Total gapped sequence length is: 100
--------------------------------------------------
Total ungapped sequence length is: 100
--------------------------------------------------
GC content in control_100bp.fasta is 55.00 %                 
##################################################

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
control_100bp.fasta		control_100bp.fasta
seq_counter.sh		seq_counter.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sequence-counter

About

Releases

Packages

Languages

naturepoker/sequence-counter

Folders and files

Latest commit

History

Repository files navigation

sequence-counter

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages