- Fix vLLM top_p parameter handling in HuggingFace Ray deployment (#524)
- Pin peft dependency to <0.14.0 for compatibility (#524)
- Update TensorRT-LLM export to use NeMo->HF->TensorRT-LLM export path
- Add chat template support for VLM deployment.
- Bug fixes and folder name updates such as updating nlp to llm.
- Bug fixes for HuggingFace model deployment (#459)
- Fixed HuggingFace deployable implementations for both Triton and Ray Serve backends
- Improved tokenizer handling in HuggingFace deployment scripts
- Minor fixes for Ray deployment (#464)
- Additional bug fixes in Ray deployment utilities
- MegatronLM and Megatron-Bridge model deployment support with Triton Inference Server and Ray Serve
- Multi-node multi-instance Ray Serve based deployment for NeMo 2, Megatron-Bridge, and Megatron-LM models.
- Update vLLM export to use NeMo->HF->vLLM export path
- Multi-Modal deployment for NeMo 2 models with Triton Inference Server
- NeMo Retriever Text Reranking ONNX and TensorRT export support
- Pip installers for export and deploy
- RayServe support for multi-instance deployment
- TensorRT-LLM PyTorch backend
- mcore inference optimizations