Skip to content
View bioinfosourabh's full-sized avatar

Block or report bioinfosourabh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
bioinfosourabh/README.md

Sourabh Kumar

Bioinformatics Scientist | Metropolis Healthcare | AIIMS Delhi
📧 [email protected]
🌐 LinkedInORCID
📸 Outside of research, I also pursue photography/filmaking .


My tech tools 💻

Python R Shell Script GNU-Bash MySQL Linux Anaconda PyTorch Keras SciPy GitHub Actions

WES WGS CNV/SV Variant Calling

Nextflow Docker Singularity AWS HPC XGBoost GNN Deep Learning AI-Scoring

Molecular Docking scRNA-Seq RNA-Seq 16S Metagenomics ChIP-Seq


Research Focus

I specialize in high-throughput genomic analysis, precision diagnostics, and cloud-native bioinformatics. My work spans the development of scalable pipelines for whole exome/genome sequencing, machine learning models for variant prioritization, and integrative multi-omics frameworks for pediatric tumors and rare genetic disorders.


Areas of Expertise

  • Computational Genomics: WES, WGS, CNV/SV detection, somatic/germline variant calling
  • NGS Pipeline Development: Nextflow, Docker/Singularity, AWS, HPC environments
  • Machine Learning in Genomics: XGBoost, GNNs, Deep Learning, AI-based variant scoring
  • Structural Bioinformatics: Molecular dynamics (GROMACS), mutational modeling, docking
  • Multi-Omics Integration: scRNA-Seq, RNA-Seq, 16S metagenomics, ChIP-Seq

Projects

  • TrioExome-Analysis:
    Clinical pipeline for trio-based whole exome sequencing with automated annotation and priortization.

  • Variant-Prioritizer:
    Framework for deleterious variant classification integrating CADD, SIFT, and dbNSFP.

  • WGS-AMR:
    Whole genome sequencing pipeline for identifying antimicrobial resistance markers.

  • Single-Cell-Analysis:
    Single Cell Data Analysis Workflow using the Seurat package


Publications

• Kumar, Sourabh, et al. "Exploring Familial Hypospadias: Genetic Insights from Copy Number Variants in a Quad Family." (2024). [https://doi.org/10.21203/rs.3.rs-4843906/v1]

• Phugat, S., Sharma, J., Kumar, S., Jain, V., Dhua, A. K., Yadav, D. K., ... & Goel, P. (2024). Genetic landscape of congenital pouch colon: systematic review and functional enrichment study. Pediatric Surgery International, 40(1), 314.

• Goel, P., Sharma, M., Kaushik, H., Kumar, S., Singh, H., Jain, V., ... & Agarwala, S. (2024). Genetic Markers of Spina Bifida in an Indian Cohort. Journal of Indian Association of Pediatric Surgeons, 29(5), 529-535.

• Sharma, J., Sharma, M., Kumar, S., Kaushik, H., Pandey, H., Lal, D., ... & Goel, P. (2025). Genetic Markers of Spina Bifida: Enrichment of Pathogenic Variants and Variants of Uncertain Significance. Journal of Indian Association of Pediatric Surgeons, 30(2), 163-169.

• Kumar, S., Sharma, J., Sardar, R., Jain, V., Dhua, A. K., Yadav, D. K., ... & Goel, P. (2025). KMT2C Polymorphism in Familial Hypospadias. Indian Journal of Pediatrics, 1-3

• Goel, Prabudh, et al. "Chromosomal Microarray Analysis in Spina Bifida: Genetic Heterogeneity and Its Clinical Implications". Journal of Indian Association of Pediatric Surgeons

• Kumar, Sourabh, et al. "Novel CTNNB1 Gene Mutations Reveal Critical Pathogenic Mechanisms in Pediatric Hepatoblastoma". Pediatric Surgery International

• Book Chapter: "Deep-genomics: Deep Learning Based Analysis of Genome Sequenced Data for Identification of Gene Alternation" for a book entitled “Artificial Intelligence (AI) in Cell and Genetic Engineering” (https://www.springer.com/series/7651), published by Springer Protocols.


Contact

Feel free to connect .
📬 [email protected]

Pinned Loading

  1. Single-Cell-Analysis Single-Cell-Analysis Public

    Single Cell Data Analysis Workflow using the Seurat package in R, covering essential steps, such as quality control, clustering, cell type identification, differential gene expression analysis, and…

    R 6 1

  2. Somatic-Variant-Calling-Pipeline Somatic-Variant-Calling-Pipeline Public

    This repository provides an automated somatic variant calling pipeline using BWA-MEM, GATK Mutect2, and bcftools. The pipeline processes paired-end FASTQ files, performs alignment, duplicate markin…

    Shell

  3. Trio-Exome-Analysis-Pipeline Trio-Exome-Analysis-Pipeline Public

    A pipeline for Trio Exome Analysis to identify de novo, inherited (AR, AD, X-linked), and mosaic variants

    Shell

  4. Germline-Variant-Calling-Pipeline Germline-Variant-Calling-Pipeline Public

    This repository provides a fully automated and modular Germline Variant Calling Pipeline using BWA, GATK, Samtools, Fastp, and bcftools. The pipeline supports end-to-end processing starting from ra…

    Shell 1

  5. Protein-Simulation-Pipeline Protein-Simulation-Pipeline Public

    A structured pipeline for performing all major steps of protein molecular dynamics simulations using GROMACS—starting from structure preparation to energy minimization, equilibration, production ru…

    1

  6. ngs-automation-nextflow ngs-automation-nextflow Public

    A modular and reproducible workflow built with Nextflow for processing short-read sequencing data. The pipeline integrates standard bioinformatics tools and follows best practices for scalable and …

    Nextflow