Skip to content

TRT-LLM loading mechanism tool #3398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

apbose
Copy link
Collaborator

@apbose apbose commented Feb 14, 2025

TRT-LLM download utility

@apbose apbose self-assigned this Feb 14, 2025
@apbose apbose marked this pull request as draft February 14, 2025 17:58
@github-actions github-actions bot added component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Feb 14, 2025
@github-actions github-actions bot requested a review from peri044 February 14, 2025 17:58
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch from 57dbb3f to 3e38e87 Compare February 25, 2025 14:34
f"Ensure the path is correct and the library is compatible",
exc_info=e_os_error,
else:
py_version = f"cp{sys.version_info.major}{sys.version_info.minor}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we restrict to cp310 and cp312, It shouldnt matter if we are pulling the whl and unzipping ourselves

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://pypi.nvidia.com/tensorrt-llm/ In this since I see the tags for only cp310 and cp312 I added the check

@github-actions github-actions bot added the component: tests Issues re: Tests label Feb 27, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-02-27 20:03:00.014038+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-02-27 20:03:24.885031+00:00
@@ -22,11 +22,11 @@

from .harness import DispatchTestCase


class TestGatherNcclOpsConverter(DispatchTestCase):
-    @parameterized.expand([(8)])
+    @parameterized.expand([8])
    def test_nccl_ops(self, linear_layer_dim):
        class DistributedGatherModel(nn.Module):
            def __init__(self, input_dim):
                super().__init__()
                self.fc = torch.nn.Linear(input_dim, input_dim)

@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch from 9ba407b to 5f3fdac Compare February 27, 2025 20:05
@apbose apbose marked this pull request as ready for review February 27, 2025 20:05
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-02-27 20:05:38.023287+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-02-27 20:06:02.662188+00:00
@@ -22,11 +22,11 @@

from .harness import DispatchTestCase


class TestGatherNcclOpsConverter(DispatchTestCase):
-    @parameterized.expand([(8)])
+    @parameterized.expand([8])
    def test_nccl_ops(self, linear_layer_dim):
        class DistributedGatherModel(nn.Module):
            def __init__(self, input_dim):
                super().__init__()
                self.fc = torch.nn.Linear(input_dim, input_dim)

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-02-27 20:05:54.405311+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-02-27 20:06:21.454993+00:00
@@ -22,11 +22,11 @@

from .harness import DispatchTestCase


class TestGatherNcclOpsConverter(DispatchTestCase):
-    @parameterized.expand([(8)])
+    @parameterized.expand([8])
    def test_nccl_ops(self, linear_layer_dim):
        class DistributedGatherModel(nn.Module):
            def __init__(self, input_dim):
                super().__init__()
                self.fc = torch.nn.Linear(input_dim, input_dim)

@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch from 5f3fdac to b66350e Compare April 15, 2025 19:57
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-04-15 19:58:05.267724+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-04-15 19:58:36.145897+00:00
@@ -22,11 +22,11 @@

from .harness import DispatchTestCase


class TestGatherNcclOpsConverter(DispatchTestCase):
-    @parameterized.expand([(8)])
+    @parameterized.expand([8])
    def test_nccl_ops(self, linear_layer_dim):
        class DistributedGatherModel(nn.Module):
            def __init__(self, input_dim):
                super().__init__()
                self.fc = torch.nn.Linear(input_dim, input_dim)

@apbose apbose changed the title change in TRT-LLM loading mechanism and exposing aot_joint_export in _compiler.py TRT-LLM loading mechanism tool Apr 15, 2025
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-04-15 21:00:13.719714+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/conversion/test_nccl_ops.py	2025-04-15 21:00:40.093669+00:00
@@ -22,11 +22,11 @@

from .harness import DispatchTestCase


class TestGatherNcclOpsConverter(DispatchTestCase):
-    @parameterized.expand([(8)])
+    @parameterized.expand([8])
    def test_nccl_ops(self, linear_layer_dim):
        class DistributedGatherModel(nn.Module):
            def __init__(self, input_dim):
                super().__init__()
                self.fc = torch.nn.Linear(input_dim, input_dim)

@github-actions github-actions bot removed the component: tests Issues re: Tests label Apr 17, 2025
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch 2 times, most recently from 6e893ed to 77f2145 Compare April 18, 2025 00:51
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch from f30acb7 to 9c238ae Compare April 29, 2025 21:33
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch 2 times, most recently from 89d621d to 27aa2f2 Compare May 2, 2025 02:36
Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a temp directory to save the wheel and unzipped wheel

@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch from 37d4e90 to e8bc3a4 Compare June 13, 2025 21:54
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch 2 times, most recently from a0cecf4 to 4193ca2 Compare July 1, 2025 09:17
@github-actions github-actions bot added the component: tests Issues re: Tests label Jul 1, 2025
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch 3 times, most recently from 4589c76 to 1e2148d Compare July 1, 2025 18:23
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch from 42d4862 to 9cb3cab Compare July 1, 2025 22:35
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch 2 times, most recently from dbfd7ee to 15d681a Compare July 4, 2025 00:40
Copy link
Collaborator

@narendasan narendasan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like its close are there any tests for the downloader? like verifying the correct file is downloaded and available?

@apbose
Copy link
Collaborator Author

apbose commented Jul 7, 2025

The test tests/py/dynamo/distributed/test_nccl_ops.py should verify it. The USE_TRTLLM_PLUGINS=1 pytest distributed/test_nccl_ops.py is the command. It is not part of CI though

apbose added 9 commits July 7, 2025 14:20
…ing in dynamo.compile

TRT-LLM installation utilities and adding test cases

adding the option in _compiler.py

changes in the TRT-LLM loading tool- removing install_wget, install_unzip, install_mpi

Further changes in error logging of the TRT-LLM installation tool

moving the load_tensorrt_llm to dynamo/utils.py

correcting misprint for TRT LLM load

Using python lib for download to make it platform agnostic

dll file path update for windows

correcting the non critical lint error

Including version in versions.txt
@apbose apbose force-pushed the nccl_ops_trt_llm_installation branch from c8b8337 to 340182b Compare July 7, 2025 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants