File tree
9 files changed
+235
-17
lines changed- all_models
- inflight_batcher_llm/tensorrt_llm/1
- tests
- ci/L0_backend_trtllm
- dockerfile
- tools
9 files changed
+235
-17
lines changedLines changed: 4 additions & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
67 | 67 |
| |
68 | 68 |
| |
69 | 69 |
| |
| 70 | + | |
| 71 | + | |
| 72 | + | |
70 | 73 |
| |
71 |
| - | |
| 74 | + | |
72 | 75 |
| |
73 | 76 |
| |
74 | 77 |
| |
|
Lines changed: 193 additions & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
311 | 311 |
| |
312 | 312 |
| |
313 | 313 |
| |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
314 | 319 |
| |
315 | 320 |
| |
316 | 321 |
| |
| |||
422 | 427 |
| |
423 | 428 |
| |
424 | 429 |
| |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
| 529 | + | |
| 530 | + | |
| 531 | + | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
425 | 579 |
| |
426 | 580 |
| |
427 | 581 |
| |
| |||
453 | 607 |
| |
454 | 608 |
| |
455 | 609 |
| |
| 610 | + | |
| 611 | + | |
456 | 612 |
| |
457 | 613 |
| |
458 | 614 |
| |
459 | 615 |
| |
460 | 616 |
| |
461 | 617 |
| |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
462 | 622 |
| |
463 | 623 |
| |
464 | 624 |
| |
465 | 625 |
| |
466 | 626 |
| |
467 | 627 |
| |
| 628 | + | |
468 | 629 |
| |
469 | 630 |
| |
470 | 631 |
| |
471 | 632 |
| |
| 633 | + | |
472 | 634 |
| |
473 | 635 |
| |
474 | 636 |
| |
| |||
564 | 726 |
| |
565 | 727 |
| |
566 | 728 |
| |
567 |
| - | |
568 | 729 |
| |
569 | 730 |
| |
570 | 731 |
| |
| |||
578 | 739 |
| |
579 | 740 |
| |
580 | 741 |
| |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
581 | 772 |
| |
582 | 773 |
| |
583 | 774 |
| |
| |||
587 | 778 |
| |
588 | 779 |
| |
589 | 780 |
| |
| 781 | + | |
590 | 782 |
|
Lines changed: 5 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
572 | 572 |
| |
573 | 573 |
| |
574 | 574 |
| |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + |
Lines changed: 12 additions & 2 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
64 | 64 |
| |
65 | 65 |
| |
66 | 66 |
| |
67 |
| - | |
68 |
| - | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
69 | 79 |
| |
70 | 80 |
| |
71 | 81 |
| |
|
Lines changed: 13 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
38 | 38 |
| |
39 | 39 |
| |
40 | 40 |
| |
| 41 | + | |
41 | 42 |
| |
42 | 43 |
| |
43 | 44 |
| |
| |||
237 | 238 |
| |
238 | 239 |
| |
239 | 240 |
| |
| 241 | + | |
| 242 | + | |
240 | 243 |
| |
| 244 | + | |
241 | 245 |
| |
242 | 246 |
| |
243 | 247 |
| |
| |||
285 | 289 |
| |
286 | 290 |
| |
287 | 291 |
| |
| 292 | + | |
| 293 | + | |
288 | 294 |
| |
| 295 | + | |
289 | 296 |
| |
290 | 297 |
| |
291 | 298 |
| |
| |||
342 | 349 |
| |
343 | 350 |
| |
344 | 351 |
| |
| 352 | + | |
| 353 | + | |
345 | 354 |
| |
| 355 | + | |
346 | 356 |
| |
347 | 357 |
| |
348 | 358 |
| |
| |||
375 | 385 |
| |
376 | 386 |
| |
377 | 387 |
| |
| 388 | + | |
| 389 | + | |
378 | 390 |
| |
| 391 | + | |
379 | 392 |
| |
380 | 393 |
| |
381 | 394 |
| |
|
Lines changed: 5 additions & 10 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
57 | 57 |
| |
58 | 58 |
| |
59 | 59 |
| |
60 |
| - | |
61 |
| - | |
62 |
| - | |
63 |
| - | |
64 | 60 |
| |
65 | 61 |
| |
66 | 62 |
| |
67 | 63 |
| |
68 | 64 |
| |
69 |
| - | |
70 |
| - | |
71 |
| - | |
72 |
| - | |
73 |
| - | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
74 | 69 |
| |
75 | 70 |
| |
76 | 71 |
| |
77 | 72 |
| |
78 |
| - | |
| 73 | + | |
79 | 74 |
| |
80 | 75 |
|
Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 | 1 |
| |
2 |
| - | |
| 2 | + | |
3 | 3 |
| |
4 | 4 |
| |
5 | 5 |
| |
|
Submodule tensorrt_llm updated 94 files
- benchmarks/cpp/gptManagerBenchmark.cpp+44
- benchmarks/python/build.py+1
- cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/libtensorrt_llm_batch_manager_static.a+2-2
- cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/libtensorrt_llm_batch_manager_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.a+2-2
- cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/batch_manager/x86_64-windows-msvc/tensorrt_llm_batch_manager_static.lib+2-2
- cpp/tensorrt_llm/executor/aarch64-linux-gnu/libtensorrt_llm_executor_static.a+2-2
- cpp/tensorrt_llm/executor/aarch64-linux-gnu/libtensorrt_llm_executor_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/executor/aarch64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/executor/x86_64-linux-gnu/libtensorrt_llm_executor_static.a+2-2
- cpp/tensorrt_llm/executor/x86_64-linux-gnu/libtensorrt_llm_executor_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/executor/x86_64-windows-msvc/tensorrt_llm_executor_static.lib+2-2
- cpp/tensorrt_llm/kernels/cutlass_kernels/moe_gemm/moe_gemm_kernels_template.h+3-2
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/aarch64-linux-gnu/version.txt+1-1
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-windows-msvc/tensorrt_llm_nvrtc_wrapper.dll+1-1
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/instantiation/decoderMaskedMultiheadAttention104_bf16.cu
- cpp/tensorrt_llm/kernels/mixtureOfExperts/moe_kernels.cu+53-13
- cpp/tensorrt_llm/pybind/executor/bindings.cpp+4-2
- cpp/tensorrt_llm/runtime/medusaModule.cpp+3
- cpp/tests/kernels/mixtureOfExpertsTest.cu+40-10
- cpp/tests/resources/scripts/build_medusa_engines.py+1-1
- docker/Dockerfile.multi+1-1
- docker/common/install_pytorch.sh+2-2
- docker/common/install_tensorrt.sh+4-5
- docs/source/reference/support-matrix.md+2-2
- docs/source/release-notes.md+4-3
- docs/source/speculative_decoding.md+161-1
- examples/baichuan/requirements.txt+1-1
- examples/bloom/requirements.txt+1-1
- examples/chatglm/requirements.txt+1-1
- examples/cogvlm/convert_checkpoint.py+2-2
- examples/dbrx/requirements.txt+1-1
- examples/falcon/requirements.txt+1-1
- examples/gemma/convert_checkpoint.py+2-2
- examples/gemma/requirements.txt+1-1
- examples/gpt/requirements.txt+1-1
- examples/gptj/requirements.txt+1-1
- examples/gptneox/README.md+3-3
- examples/gptneox/convert_checkpoint.py+3-4
- examples/gptneox/requirements.txt+1-1
- examples/grok/requirements.txt+1-1
- examples/high-level-api/requirements.txt+1-1
- examples/internlm/requirements.txt+1-1
- examples/llama/README.md+2-2
- examples/llama/requirements.txt+1-1
- examples/mamba/README.md+5-13
- examples/mamba/requirements.txt+2-1
- examples/medusa/README.md+30-5
- examples/medusa/convert_checkpoint.py+15-98
- examples/medusa/requirements.txt+1-1
- examples/mixtral/requirements.txt+1-1
- examples/mmlu.py+5-3
- examples/mpt/requirements.txt+1-1
- examples/nemotron/requirements.txt+1-1
- examples/opt/requirements.txt+1-1
- examples/phi/README.md+13-17
- examples/phi/convert_checkpoint.py+7-10
- examples/phi/postprocess_quant_checkpoint.py-63
- examples/phi/requirements.txt+1-1
- examples/quantization/quantize.py+18-1
- examples/quantization/requirements.txt+1-1
- examples/qwen/requirements.txt+1-1
- examples/qwenvl/requirements.txt+1-1
- examples/recurrentgemma/requirements.txt+1-1
- examples/run.py+1-1
- examples/skywork/requirements.txt+1-1
- examples/smaug/requirements.txt+1-1
- examples/whisper/requirements.txt+1-1
- requirements.txt+5-4
- tensorrt_llm/auto_parallel/parallelization.py+2-2
- tensorrt_llm/auto_parallel/tensor_parallel/plugin_node.py+5
- tensorrt_llm/auto_parallel/tensor_parallel/plugin_nodes/gpt_attention_node.py+3-2
- tensorrt_llm/commands/build.py+1-15
- tensorrt_llm/models/__init__.py+1-4
- tensorrt_llm/models/gemma/model.py+1
- tensorrt_llm/models/generation_mixin.py+93-53
- tensorrt_llm/models/llama/convert.py+39-30
- tensorrt_llm/models/mamba/model.py+62-35
- tensorrt_llm/models/medusa/weight.py+66-29
- tensorrt_llm/models/modeling_utils.py+26-7
- tensorrt_llm/models/phi3/convert.py+76-5
- tensorrt_llm/models/phi3/model.py+107-30
- tensorrt_llm/models/phi3/phi3small/__init__.py-14
- tensorrt_llm/models/phi3/phi3small/model.py-257
- tensorrt_llm/models/phi3/split_weights.py+2-100
- tensorrt_llm/models/recurrentgemma/model.py+23-44
- tensorrt_llm/quantization/layers.py+10-2
- tensorrt_llm/quantization/quantize.py+13-7
- tensorrt_llm/quantization/quantize_by_modelopt.py+101-21
- tensorrt_llm/version.py+1-1
- tests/model/test_mamba.py+6-4
- tests/test_llama_conversion.sh+1-1
Lines changed: 1 addition & 1 deletion
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 |
| - | |
| 1 | + |
0 commit comments