Skip to content

Disaggregated Prefill & Decode serving optimizations #3963

@mk-nvidia

Description

@mk-nvidia

Disaggregated Prefill & Decode serving

  • [Done] MPI/UCX backend integration
  • [Ongoing] NIXL Integration
  • Performance tuning
  • Best practice guide

Metadata

Metadata

Assignees

No one assigned

    Labels

    InvestigatingPerformanceTRTLLM model inference speed, throughput, efficiency. Latency, benchmarks, regressions, opts.roadmaptriagedIssue has been triaged by maintainers

    Projects

    Status

    April 2025 - June 2025

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions