Machine Learning Engineer | Data Scientist | GenAI & ML Systems Builder
🎓 MS Computer Science @ North Carolina State University
📍 Raleigh, NC
🔗 LinkedIn | 🌐 Portfolio | 📧 ksonawa@ncsu.edu
I’m a Machine Learning & Data Science engineer who enjoys building end-to-end ML and GenAI systems — from messy real-world data and ambiguous business problems to scalable, production-ready solutions.
I’ve worked on customer lifecycle modeling, forecasting pipelines, LLM-powered automation, and RAG systems, combining strong fundamentals in ML, statistics, and systems with hands-on experience in cloud and MLOps.
I especially enjoy problems where data, models, and real business decisions intersect.
My goal is to grow into a full-stack ML / AI engineer who can:
- Design scalable ML & GenAI systems end-to-end
- Build reliable data pipelines and forecasting models
- Translate vague business questions into measurable ML solutions
- Ship models that actually get used in production
Machine Learning & AI
- Supervised & Unsupervised Learning, Feature Engineering
- DNNs, LSTMs, Transformers, LLMs, RAG
- Statistical Modeling, Hypothesis Testing, Forecasting
- Computer Vision (Object Detection, Benchmarking)
Generative AI
- Retrieval-Augmented Generation (RAG)
- Knowledge Graph + LLM systems
- Multi-agent LLM workflows
- Prompt engineering & fine-tuning
Data & Systems
- SQL-based analytics & ETL pipelines
- Time-series forecasting & lifecycle modeling
- Data quality, metric consistency, experimentation
- End-to-end ML pipelines (data → model → deployment)
- GenAI systems using RAG, agents, and knowledge graphs
- Forecasting & lifecycle models for business decision-making
- Computer vision benchmarks and performance trade-off studies
- LLM-powered developer tooling and automation
May 2025 – Aug 2025 | USA
- Modeled customer & product lifecycle signals by engineering 20+ features, enabling stable quarterly renewal & tech-refresh risk estimates
- Reduced prediction variance by up to 9% across 9 customer segments
- Built end-to-end forecasting pipelines (AzureML, AbacusAI) on 3M+ records for renewals and revenue planning
- Designed and owned SQL + Python ETL pipelines ensuring data quality and metric consistency
- Partnered closely with Product & Finance to translate ambiguous business needs into scalable ML solutions
Jul 2023 – Aug 2023 | India
- Identified KPIs and built interactive dashboards using Python, Tableau, and Power BI
- Analyzed engagement trends across domains (Crypto, Covid-19, Spotify vs YouTube)
- Enabled data-driven reporting and performance insights
Aug 2025 – Nov 2025
- Built a 2-agent LLM-based technical debt automation system using GitHub Actions & MCP-enabled VS Code workflows
- Fine-tuned GPT-4o-mini on 140k+ real-world code smells
- Detected 20+ code smells with 70–80% F1-score
- Achieved ~70% safe refactors with end-to-end remediation in seconds (MCP) to minutes (PR-based)
Mar 2025 – Aug 2025
- Developed a biomedical QA assistant using knowledge graphs + retrieval + LLMs
- Processed 10,000+ biomedical entities from structured data
- Implemented a robust fallback mechanism achieving >95% reliable responses
Sep 2024 – Dec 2024
- Designed benchmarking experiments on PASCAL dataset (9,600+ images)
- Optimized models using PyTorch & CUDA
- Quantified speed–accuracy trade-offs:
- YOLO: 135 FPS @ 75.7% mAP
- SSD: 30 FPS @ 91.7% precision
Jul 2023 – May 2024
- Built recommendation systems using ML (RF, SVD, PMF) and DL (CNNs, LSTMs, Autoencoders)
- Analyzed Amazon Food Reviews dataset (568K reviews, 256K users)
- Performed SQL-driven feature extraction and statistical analysis for personalization
- NVIDIA — Accelerating End-to-End Data Science Workflows
- SIES GST — DevOps Student Development Program
Girija Welfare Association — Navi Mumbai, India
Jul 2022 – Jul 2024
- Led cross-functional teams for community events and fundraising
- Developed leadership, coordination, and stakeholder management skills
- 💼 LinkedIn: https://linkedin.com/iamkaustubhs
- 🌐 Portfolio: https://kaustubhas.github.io
- 📧 Email: ksonawa@ncsu.edu
⭐ Always open to ML, GenAI, and data-driven engineering opportunities.

