Update on the development branch #1690
kaiyux
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
The TensorRT-LLM team is pleased to announce that we are pushing an update to the development branch (and the Triton backend) this May 28, 2024.
This update includes:
examples/jais/README.md.examples/dit/README.md.Video NeVAsection inexamples/multimodal/README.md.distil-whisper/distil-large-v3, thanks to the contribution from @IbrahimAmin1 in [feat]: Add Option to convert and run distil-whisper large-v3 #1337.trtllm-buildcommand), see documents: examples/whisper/README.md.free_gpu_memory_fractioninModelRunnerCpptokv_cache_free_gpu_memory_fractionModelRunnerCpp, includingmax_tokens_in_paged_kv_cache,kv_cache_enable_block_reuseandenable_chunked_contextenable_executorfromtensorrt_llm.LLMAPI as it is using the C++ExecutorAPI now.OutputConfigingenerateAPI.BuildConfigto thetensorrt_llm.LLMAPI.LLMconstruction phase, remove most of the trivial logs.SpeculativeDecodingMode.hto choose between different speculative decoding techniques.SpeculativeDecodingModule.hbase class for speculative decoding techniquesdecodingMode.hnvcr.io/nvidia/pytorch:24.04-py3.nvcr.io/nvidia/tritonserver:24.04-py3.Thanks,
The TensorRT-LLM Engineering Team
Beta Was this translation helpful? Give feedback.
All reactions