Skip to content
View pathak-ashutosh's full-sized avatar
💭
Learning
💭
Learning

Block or report pathak-ashutosh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pathak-ashutosh/README.md

Ashutosh Pathak

I work on large-scale machine learning systems, focusing on the design, training, and deployment of models that operate reliably under real-world conditions. My interests include large language models, multimodal architectures, retrieval-augmented generation, and the data/compute infrastructure required to support them.

My work spans the entire lifecycle of modern ML systems: dataset construction, training pipelines, evaluation methodology, and inference optimization. I care about clarity in system design, reproducibility, and empirical rigor. I also build tools that make model behavior more interpretable and controllable.

I hold a Bachelor's and Master’s in Computer Science (Machine Learning). I write about ML, systems, and experimentation at https://thenumbercrunch.com/.

Side projects include HiveHaven - a lightweight platform for international students seeking housing in the U.S., and PolNet - a data visualization tool for analyzing and visualizing U.S. congressional caucus memberships and political network data.

Current reading: “Build a Large Language Model (From Scratch)” by Sebastian Raschka.


Focus areas

Predictive Modeling, Large Language Models, Multimodal Models, Generative Modeling
Retrieval-Augmented Generation, Vector Search, Data-Centric Evaluation
Training Pipelines, MLOps, Distributed Systems, High-Throughput Inference


Toolchain

Javascript, Python, C/C++, C#, SQL
PyTorch, TensorFlow, Scikit-learn
LangChain, LangGraph, ElasticSearch, Neo4j
Apache Spark, Databricks, Hadoop (HDFS), Postgres, BigQuery
AWS SageMaker, Amazon Bedrock, Vertex AI, Azure ML
Docker, Kubernetes, Git, DVC


Contact

LinkedIn: https://www.linkedin.com/in/pathak-ash/
X: https://x.com/pathak_jeee
Email: [email protected]
Writing: https://thenumbercrunch.com/

Pinned Loading

  1. Eye-blink-detection Eye-blink-detection Public

    Detect eye blinks based on eye aspect ratio (EAR) introduced by Soukupová and Čech in their 2016 paper, Real-Time Eye Blink Detection Using Facial Landmarks.

    Python 68 28

  2. econberta econberta Public

    Robust Extraction of Named Entities in Economics

    Jupyter Notebook 1

  3. clinical-risk-prediction clinical-risk-prediction Public

    Clinical Risk Prediction using EHRs

    Jupyter Notebook 1

  4. gentopia-mason gentopia-mason Public

    Forked from LittleYUYU/Gentopia-Mason

    Build Hierarchical Autonomous Agents through Config. Collaborative Growth of Specialized Agents.

    Python 1

  5. spark-movie-recommendation spark-movie-recommendation Public

    A movie recommendation system on MovieLens 25M dataset using Python and Apache Spark

    Python 2

  6. Drowsiness-detection Drowsiness-detection Public

    Detect if the driver is feeling drowsy and sound an alarm on doing so to wake him/her up.

    Python 5 1