Building production-grade AI for the next billion users — not just the next billion dollars.
I build AI systems that run on hardware most researchers discard.
My work sits at the intersection of edge AI, agentic systems, and low-resource NLP — with a focus on making powerful AI accessible on 4 GB VRAM consumer hardware, in Indian languages, without cloud dependency.
- 💎 Building PRISM-AI — a model-agnostic recommendation engine for LLMs that eliminates "Intent Alignment Tax" (Local Llama 3.2, Persistent UIV Store)
- 🧠 Building CHAARI 2.0 — a privacy-first bilingual agentic AI OS companion (Hinglish, 4 GB VRAM, cryptographic mesh, arXiv in prep)
- ⚡ Researching NMOS — 70B+ inference on 4 GB VRAM via anticipatory behavioral signal loading
- 🎓 MSc Information Technology @ Lovely Professional University (May 2026)
- 💼 ex-Intern @ LG Electronics (predictive maintenance, >95% recall)
- 🏐 State & National Volleyball player — teamwork on and off the court
A "Recommendation Engine" for Generative AI
Model-agnostic framework that eliminates repetitive AI instructions by predicting user intent from chat history. Bringing Big Data principles to LLM orchestration.
| Component | Detail |
|---|---|
| Core | Hybrid UIV inference (weighted signals + confidence gating + temporal decay) |
| Big Data | Profile Store v2 with metadata, temporary overrides, retention hooks, export/delete controls |
| Model | Seamlessly integrated with Local Llama 3.2:3b via Ollama |
| UX | Interactive Gradio Web UI with real-time intent visualization and profile health status |
| Research | Production-style evaluation metrics + observability for misclassification tracking |
Comprehensive Hinglish AI Agentic Runtime Interface
Production-grade two-node agentic AI companion running entirely on RTX 2050 (4 GB VRAM). Built solo. No research lab. No cloud budget.
| Component | Detail |
|---|---|
| Scale | 39+ Python modules · 8,000+ lines of code · 369+ automated tests |
| Model | Fine-tuned Qwen 2.5 4.2B on custom Hinglish dataset · 30–40 tok/s on 4 GB VRAM |
| Safety | 7-layer Constitutional AI-inspired pipeline (code-based, not prompt-based) |
| RAG | RAPTOR 3-level hierarchical RAG · 1.14 GB vector index · sub-second retrieval |
| Voice | Full-duplex STT+TTS · sub-800ms conversation latency · sub-100ms tool calls |
| Research | arXiv paper in preparation (cs.CL / cs.AI) |
Anticipatory Inference for LLMs Using User Interaction Signals
The Zero-Lag Hypothesis: Perceived Latency ≈ max(0, T_load − T_typing)
Running 70B+ parameter models on 4 GB VRAM by using human behavioral signals to mask the physical memory wall.
| Module | Role | Status |
|---|---|---|
| Scout (SmolLM2-135M) | Real-time shard affinity prediction | ✅ 90% accuracy |
| River | Async double-buffered prefetcher | ✅ Zero GPU stall |
| Engine | Speculative decoding orchestrator (K=15) | ✅ ~16 tok/s on 70B |
| Project | Stack | Highlights |
|---|---|---|
| Autonomous Financial Research Agent | LangGraph · MCP · FinBERT | Multi-step reasoning workflow with neurosymbolic guardrails |
| HinglishSearch RAG | Endee VectorDB · Docker · CHAARI 2.0 | Semantic search for Hinglish documents · sub-second retrieval |
| Industrial Predictive Maintenance | PyTorch · LSTM · Isolation Forest | >95% recall · deployed at LG Electronics |
Core Languages
AI / ML
LLMs & GenAI
RAG & Vector DBs
- 🏅 Train/Build Small Language Models — Google DeepMind (Advanced)
- 🏅 Enterprise AI Agents & Fundamentals — Google Cloud
- 🏅 Gemini Enterprise Applications — Google Cloud
- 🏅 Quantitative Research — JPMorgan Chase & Co. (Forage)
- 🏅 Data Analytics — Deloitte Australia (Forage)
"Building AI for the next billion users, not just the next billion dollars."
— Built in Rudrapur. Running everywhere.