Skip to content
View KaustubhAs's full-sized avatar

Block or report KaustubhAs

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
KaustubhAs/README.md

Hi, I’m Kaustubh Sonawane 👋

Machine Learning Engineer | Data Scientist | GenAI & ML Systems Builder
🎓 MS Computer Science @ North Carolina State University

📍 Raleigh, NC
🔗 LinkedIn | 🌐 Portfolio | 📧 ksonawa@ncsu.edu


👨‍💻 About Me

I’m a Machine Learning & Data Science engineer who enjoys building end-to-end ML and GenAI systems — from messy real-world data and ambiguous business problems to scalable, production-ready solutions.

I’ve worked on customer lifecycle modeling, forecasting pipelines, LLM-powered automation, and RAG systems, combining strong fundamentals in ML, statistics, and systems with hands-on experience in cloud and MLOps.

I especially enjoy problems where data, models, and real business decisions intersect.


🎯 What I’m Working Toward

My goal is to grow into a full-stack ML / AI engineer who can:

  • Design scalable ML & GenAI systems end-to-end
  • Build reliable data pipelines and forecasting models
  • Translate vague business questions into measurable ML solutions
  • Ship models that actually get used in production

🧠 Skill Snapshot

🧰 Tech Stack

💻 Programming Languages

Python SQL R C++ Kotlin

🧠 Machine Learning & Deep Learning

PyTorch TensorFlow Scikit--Learn Keras CUDA OpenCV

🤖 Generative AI & LLMs

Hugging Face LangChain LLMs Transformers RAG

📊 Data Science & Statistics

Data Pipelines Statistical Modeling Hypothesis Testing Computer Vision LSTMs DNNs

☁️ Cloud, Databases & MLOps

AWS Azure Docker FastAPI Git GitHub

🗄️ Databases & Data Tools

PySpark PostgreSQL SQLite SQLAlchemy Jupyter

📈 Visualization & Analytics

PowerBI Tableau Excel Matplotlib Seaborn

🧪 Testing & Dev

Pytest


Machine Learning & AI

  • Supervised & Unsupervised Learning, Feature Engineering
  • DNNs, LSTMs, Transformers, LLMs, RAG
  • Statistical Modeling, Hypothesis Testing, Forecasting
  • Computer Vision (Object Detection, Benchmarking)

Generative AI

  • Retrieval-Augmented Generation (RAG)
  • Knowledge Graph + LLM systems
  • Multi-agent LLM workflows
  • Prompt engineering & fine-tuning

Data & Systems

  • SQL-based analytics & ETL pipelines
  • Time-series forecasting & lifecycle modeling
  • Data quality, metric consistency, experimentation

🛠️ What I Build

  • End-to-end ML pipelines (data → model → deployment)
  • GenAI systems using RAG, agents, and knowledge graphs
  • Forecasting & lifecycle models for business decision-making
  • Computer vision benchmarks and performance trade-off studies
  • LLM-powered developer tooling and automation

💼 Experience

📊 Data Science Intern — NetApp

May 2025 – Aug 2025 | USA

  • Modeled customer & product lifecycle signals by engineering 20+ features, enabling stable quarterly renewal & tech-refresh risk estimates
  • Reduced prediction variance by up to 9% across 9 customer segments
  • Built end-to-end forecasting pipelines (AzureML, AbacusAI) on 3M+ records for renewals and revenue planning
  • Designed and owned SQL + Python ETL pipelines ensuring data quality and metric consistency
  • Partnered closely with Product & Finance to translate ambiguous business needs into scalable ML solutions

📈 Data Visualization Intern — SIES GST

Jul 2023 – Aug 2023 | India

  • Identified KPIs and built interactive dashboards using Python, Tableau, and Power BI
  • Analyzed engagement trends across domains (Crypto, Covid-19, Spotify vs YouTube)
  • Enabled data-driven reporting and performance insights

🚀 Featured Projects

🤖 Code Smell Classification & Refactoring using LLMs

Aug 2025 – Nov 2025

  • Built a 2-agent LLM-based technical debt automation system using GitHub Actions & MCP-enabled VS Code workflows
  • Fine-tuned GPT-4o-mini on 140k+ real-world code smells
  • Detected 20+ code smells with 70–80% F1-score
  • Achieved ~70% safe refactors with end-to-end remediation in seconds (MCP) to minutes (PR-based)

🧬 Biomedical Assistant (RAG + Knowledge Graph)

Mar 2025 – Aug 2025

  • Developed a biomedical QA assistant using knowledge graphs + retrieval + LLMs
  • Processed 10,000+ biomedical entities from structured data
  • Implemented a robust fallback mechanism achieving >95% reliable responses

👁️ Comparative Analysis of Real-Time Object Recognition

Sep 2024 – Dec 2024

  • Designed benchmarking experiments on PASCAL dataset (9,600+ images)
  • Optimized models using PyTorch & CUDA
  • Quantified speed–accuracy trade-offs:
    • YOLO: 135 FPS @ 75.7% mAP
    • SSD: 30 FPS @ 91.7% precision

🛒 Customer Behavior Analysis & Recommendation Systems

Jul 2023 – May 2024

  • Built recommendation systems using ML (RF, SVD, PMF) and DL (CNNs, LSTMs, Autoencoders)
  • Analyzed Amazon Food Reviews dataset (568K reviews, 256K users)
  • Performed SQL-driven feature extraction and statistical analysis for personalization

📜 Certifications

  • NVIDIA — Accelerating End-to-End Data Science Workflows
  • SIES GST — DevOps Student Development Program

🤝 Leadership & Community

Girija Welfare Association — Navi Mumbai, India
Jul 2022 – Jul 2024

  • Led cross-functional teams for community events and fundraising
  • Developed leadership, coordination, and stakeholder management skills

📫 Let’s Connect

⭐ Always open to ML, GenAI, and data-driven engineering opportunities.

My Skills

My Skills

Pinned Loading

  1. ExpenseLlama-LLM-Powered-Expense-Tracker ExpenseLlama-LLM-Powered-Expense-Tracker Public

    Python

  2. BioRAG BioRAG Public

    A RAG system for biomedical information that combines a knowledge graph with natural language processing to answer questions about diseases and symptoms. Built with a robust fallback mechanism ensu…

    Python

  3. Hand-Guesture-Control Hand-Guesture-Control Public

    An innovative method of computer control using a real-time web camera is demonstrated in this Hand Gesture Control project. The interactive computer system proposed in this study can function witho…

    Python

  4. ForkBombers/Enigma ForkBombers/Enigma Public

    Forked from rohit-sram/Enigma

    This repo is for a discord music bot

    Python