Senior Machine Learning Engineer | LLM Systems & Production AI
London, United Kingdom
ibrahimdaud03@gmail.com •
LinkedIn •
GitHub •
Medium •
dev •
Reddit
I architect and deploy end-to-end machine learning systems from distributed training to low-latency inference. Specialized in LLM fine-tuning, inference optimization, and MLOps infrastructure that powers real-world applications serving millions of users.
|
7B+ Parameters Trained Multilingual LLM |
50k+ Requests/Hour Peak Production Traffic |
25% Latency Reduction Inference Optimization |
500+ Stores Deployed Retail AI Platform |
|
240 H100 GPUs Distributed Training |
5TB+ Data Processed Training Pipelines |
94% Model Accuracy Document Classification |
100k+ Daily Users Production APIs |
|
Model Architectures
Advanced Training
Inference Systems
Agentic AI
|
Production Deployment
Cloud Platforms
Model Operations
High Availability
|
Vector Intelligence
RAG Pipelines
Data Engineering
Databases
|
|
Core Languages
ML Frameworks
Development Tools
|
Vision Models
OCR & Document AI
Libraries
|
|
Aug 2024 - present | Remote, London- UK AI-Driven Retail Intelligence PlatformImpact: Deployed across 500+ stores, improving on-shelf availability by 20% Technical Implementation:
Stack: LLMs, Multi-agent systems, RAG, FastAPI, Kubernetes, AWS, Vector DBs |
|
Oct 2023 - Aug 2024 | Bangalore, India Multilingual LLM Training & DeploymentImpact: Trained 7B-parameter model from scratch, reducing inference latency by 25% Technical Implementation:
Stack: PyTorch, DeepSpeed, TensorRT-LLM, vLLM, FastAPI, H100 GPUs, SentencePiece Publication: Krutrim LLM: Multilingual Foundational Model |
|
Sep 2022 - Oct 2023 | Gurgaon, India Insurance Document Processing AutomationImpact: F1 score improvement from 0.78 to 0.94, processing time reduced from hours to minutes Technical Implementation:
Stack: Transformers, LoRA, PEFT, DPO, RLHF, FastAPI, PostgreSQL |
|
Feb 2021 - Sep 2022 | Gurgaon, India OCR & Document Processing at ScaleImpact: 94% extraction accuracy, processing millions of records monthly Technical Implementation:
Stack: OpenCV, Tesseract, CNNs, Apache Airflow, PostgreSQL, FastAPI |
| Category | Technologies |
|---|---|
| LLM Frameworks | PyTorch • TensorFlow • Hugging Face • LangChain • LlamaIndex |
| Model Architectures | GPT-4 • BERT • LLaMA • Mistral • MPT • Qwen |
| Training & Fine-tuning | LoRA • QLoRA • PEFT • DPO • PPO • RLHF • DeepSpeed • Megatron |
| Inference Optimization | TensorRT-LLM • vLLM • HuggingFace TGI • ONNX • Triton |
| AI Orchestration | Multi-agent Systems • MCP (Model Context Protocol) • RAG Pipelines |
| Vector Databases | Pinecone • Weaviate • Chroma • FAISS |
| MLOps & Cloud | Docker • Kubernetes • AWS (SageMaker, EC2, S3) • FastAPI • GitHub Actions |
| Data Engineering | Apache Spark • Apache Airflow • Kafka • PostgreSQL • MongoDB |
| Programming | Python • CUDA • SQL • JavaScript • C • Git |
| Computer Vision | OpenCV • OCR • Tesseract • CNNs • Object Detection |
|
MSc in Artificial Intelligence |
BTech in Computer Science and Engineering |
|
Krutrim LLM: Multilingual Foundational Model for Large-Scale Deployment |
|
Virtual Try-On Clothing Using Deep Learning |
|
Human Body Measurement Estimation from 2D Images |
|
Agentic AI Systems Building autonomous agents with MCP and tool use |
LLM Inference at Scale Sub-100ms latency for production workloads |
MLOps Best Practices End-to-end model lifecycle management |
If you're working on something interesting, let's talk

