I'm a Machine Learning Research Fellow at Johns Hopkins University making computer learn all different kinds of modes.
- Multi-Modal ML: Building systems that understand text, video, and beyond
- Model Efficiency: Compression, quantization, and distillation techniques for production-ready AI
- RAG Systems: Advancing Retrieval Augmented Generation for textual and video domains at JHU
- GPU Programming: Leveraging CUDA and Triton for high-performance ML implementations (spare time)
Languages & Frameworks
- Python | C++ | CUDA
- PyTorch | Triton
Specializations
- Model Compression (Quantization, Pruning, Knowledge Distillation)
- Distributed Training & Inference
- Edge Deployment Optimization
- Multi-Modal Architecture Design
- Scalable Multi-Modal Architectures: Developing models that efficiently process diverse data types in distributed environments
- Cloud-to-Edge ML Pipeline: Streamlining the entire ML lifecycle from training to deployment across cloud and edge devices
- Hardware-Aware Optimization: Implementing compression techniques that leverage specific hardware capabilities for maximum inference efficiency
- Model distillation and quantization projects
- Efficient training strategies for large-scale models
- Multi-modal ML applications
- Edge deployment optimization