Rust/Python software engineer building data and ML systems for computational biology.
I ship installable bioinformatics packages, reproducible pipelines, and clinical prototypes with CI, real-data validation, and published artefacts.
rustscenic (v0.4.7, PyPI, docs, Zenodo DOI): faster, lower-overhead regulatory-network analysis for single-cell and multiome data, shipped as one Python package with Rust kernels.
11xto52xfaster than SCENIC+ on tested real-data core E2E rows; median speedup27x100k-cell benchmark used6.3 GBRAM; comparable legacy workflows have reported>40 GB- One install:
pip install rustscenic; Python 3.10 to 3.13; Linux, macOS, and Windows wheels - Core install avoids Java, dask, CUDA, and Snakemake
- Rust + PyO3 stages: GRN, AUCell, topics, cisTarget, enhancer links, eRegulons
- Evidence: benchmarks, PyPI, docs, Zenodo DOI, branch-protected CI, committed validation artefacts
- Built with the Kuan-lin Huang Lab at Icahn Mount Sinai
- Core: Rust, PyO3, Python, pandas, numpy, scipy, scanpy, anndata
- Pipelines: Nextflow DSL2, Docker, Singularity, GitHub Actions
- ML/product: PyTorch, React, TypeScript, Supabase
| Project | Stack | Evidence |
|---|---|---|
| RustScenic airway validation case study | Python, pySCENIC comparison, CI | Real-atlas head-to-head on 31,602 airway cells and 59 regulons; mean per-cell Pearson r = 0.984; 27x AUCell timing difference; Zenodo DOI |
| External open-source contributions | scverse scientific Python ecosystem | 5 merged PRs to scanpy, 2 merged PRs to PyDESeq2, and open algorithmic PR on AnnData concat API |
| RNA-seq Nextflow pipeline | Nextflow DSL2, Docker, Singularity, AWS Batch | FASTQ to QC, trimming, HISAT2, featureCounts, DESeq2, and MultiQC; Seqera-ready schema; synthetic end-to-end CI |
| Bulk RNA-seq differential expression | R, DESeq2, CI, reproducible artefacts | SARS-CoV-2 nasopharyngeal RNA-seq; 1,773 DE genes in primary cohort; 99.8% concordance with larger sensitivity set; Zenodo DOI |
| Airway cell-type deconvolution | PyTorch, single-cell references, pseudo-bulk validation | Deconvolution of 484 bulk RNA-seq samples into 14 airway cell types; r = 0.954 on pseudo-bulk 5-fold CV; model metadata for reuse |
| Single-cell immune profiling | Scanpy, Scrublet, Leiden, PAGA, CI | PBMC pipeline with QC, marker annotation, trajectory inference, T-cell subclustering, full-pipeline CI smoke validation, and output checksums |
| SafetyNett | React, TypeScript, Supabase | Clinical safety-netting prototype; CI covers lint, explicit TypeScript checking, production build, and tests |



