-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Submitting Author: Rajkanwar Singh (@Rajkanwars15)
Package Name: bioquik
One-Line Description of Package: bioquik quickly finds and counts special DNA patterns (called motifs anchored at CG spots) in genome files (FASTA format).
Repository Link (if existing): https://github.com/Rajkanwars15/bioquik
EiC: TBD
Code of Conduct & Commitment to Maintain Package
- I agree to abide by pyOpenSci's Code of Conduct during the review process and in maintaining my package after should it be accepted.
- I have read and will commit to package maintenance after the review as per the pyOpenSci Policies Guidelines.
Description
- Include a brief paragraph describing what your package does:
bioquik is an open-source Python toolkit for fast and reproducible quantification of CG-anchored DNA motifs in FASTA sequences. It automates motif expansion, efficient searching, and structured reporting to support downstream genomic and epigenomic analyses.
It leverages a high-performance FM-index backend (via pydivsufsort) to count motifs directly from large reference genomes and multi-sample FASTA datasets with low memory usage. It is designed for integration into bioinformatics pipelines, enabling parallel processing, rich progress reporting, and multiple machine-readable output formats (per-file CSV, combined summary CSV, and optional JSON). Optional visualization capabilities provide motif distribution plots and heatmaps to support exploratory analysis and quality control.
Community Partnerships
We partner with communities to support peer review with an additional layer of
checks that satisfy community requirements. If your package fits into an
existing community please check below:
- Astropy: My package adheres to Astropy community standards
- Pangeo: My package adheres to the Pangeo standards listed in the pyOpenSci peer review guidebook
Scope
-
Please indicate which category or categories this package falls under:
- Data retrieval
- Data extraction
- Data processing/munging
- Data deposition
- Data validation and testing
- Data visualization
- Workflow automation
- Citation management and bibliometrics
- Scientific software wrappers
- Database interoperability
Domain Specific
- Geospatial
- Education
-
Explain how and why the package falls under these categories (briefly, 1-2 sentences). For community partnerships, check also their specific guidelines as documented in the links above. Please note any areas you are unsure of:
bioquik extracts motif occurrence data directly from genomic FASTA files using an FM-index to enable scalable search on large datasets. It then aggregates and summarizes these counts into standard tabular outputs with optional visual analytics. These components support computational genomics workflows focused on DNA sequence motif frequency analysis. -
Who is the target audience and what are the scientific applications of this package?
Bioinformaticians, genomic researchers analyzing DNA sequences for motifs.
Applications: gene studies, motif frequency in genomes -
Are there other Python packages that accomplish similar things? If so, how does yours differ?
Biopython (SeqUtils for basic motif search), scikit-bio (sequence tools). bioquik differs with FM-index for many times faster counting on GB-scale files/motifs, CG-anchoring, native parallelism. -
Any other questions or issues we should be aware of:
P.S. Have feedback/comments about our review process? Leave a comment here
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status