[torch.compile] consider relevant code in compilation cache #11614

youkaichao · 2024-12-30T05:05:46Z

example output:

$ vllm serve meta-llama/Meta-Llama-3-8B -O3
...
DEBUG 12-29 21:04:52 backends.py:495] Traced files (to be considered for compilation cache):
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/uv_envs/py310/lib/python3.10/site-packages/torch/nn/modules/container.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/attention/layer.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/distributed/communication_op.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/distributed/parallel_state.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/custom_op.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/activation.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/layernorm.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/linear.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/rotary_embedding.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py
DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/models/llama.py
...

Looks pretty good.

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-12-30T05:05:59Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Signed-off-by: youkaichao <[email protected]>

youkaichao · 2024-12-30T06:05:42Z

/data/youkaichao/uv_envs/py310/lib/python3.10/site-packages/torch/nn/modules/container.py

this line is relevant because we use ModuleList, and the forward pass will iterate over the module list.

tlrmchlsmth

Very cool

tlrmchlsmth · 2024-12-31T18:40:15Z

vllm/sequence.py

+    def __init__(self, tensors):
+        # manually define this function, so that
+        # Dynamo knows `IntermediateTensors()` comes from this file.
+        # Otherwise, dataclass will generate this function by evaluating
+        # a string, and we will lose the information about the source file.
+        self.tensors = tensors


Is this something that we'll need to do for every dataclass that's used during model execution?

tlrmchlsmth · 2024-12-31T19:14:39Z

vllm/compilation/backends.py

+            hash_key = hashlib.md5(
+                f"{config_hash}_{code_hash}".encode()).hexdigest()[:10]


Why not take the whole hash?

tlrmchlsmth · 2024-12-31T19:24:48Z

vllm/compilation/decorators.py

+            self.vllm_config.compilation_config.traced_files.add(
+                self.original_code_object.co_filename)
+            inline_call = InliningInstructionTranslator.inline_call
+
+            def patched_inline_call(parent, func, args, kwargs):
+                code = func.get_code()
+                self.vllm_config.compilation_config.traced_files.add(
+                    code.co_filename)
+                return inline_call(parent, func, args, kwargs)
+
+            with patch.object(InliningInstructionTranslator, 'inline_call',
+                              patched_inline_call):
+                output = self.compiled_callable(*args, **kwargs)


Could you add a comment explaining how and why we are adding the file names here?

youkaichao added 7 commits December 30, 2024 08:59

collect all files

2caec12

Signed-off-by: youkaichao <[email protected]>

init

0782f10

Signed-off-by: youkaichao <[email protected]>

Merge branch 'main' into compilation_cache_with_code

db98e50

move hash cache to vllm backend

f2ef7a4

Signed-off-by: youkaichao <[email protected]>

finish

387747c

Signed-off-by: youkaichao <[email protected]>

fix

17f8648

Signed-off-by: youkaichao <[email protected]>

pprint

a8eaa49

Signed-off-by: youkaichao <[email protected]>

youkaichao requested a review from tlrmchlsmth December 30, 2024 05:06

fix pp

6baccf5

Signed-off-by: youkaichao <[email protected]>

tlrmchlsmth approved these changes Dec 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch.compile] consider relevant code in compilation cache #11614

[torch.compile] consider relevant code in compilation cache #11614

youkaichao commented Dec 30, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 30, 2024

youkaichao commented Dec 30, 2024

tlrmchlsmth left a comment

tlrmchlsmth Dec 31, 2024

tlrmchlsmth Dec 31, 2024

tlrmchlsmth Dec 31, 2024

		hash_key = hashlib.md5(
		f"{config_hash}_{code_hash}".encode()).hexdigest()[:10]

[torch.compile] consider relevant code in compilation cache #11614

Are you sure you want to change the base?

[torch.compile] consider relevant code in compilation cache #11614

Conversation

youkaichao commented Dec 30, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 30, 2024

youkaichao commented Dec 30, 2024

tlrmchlsmth left a comment

Choose a reason for hiding this comment

tlrmchlsmth Dec 31, 2024

Choose a reason for hiding this comment

tlrmchlsmth Dec 31, 2024

Choose a reason for hiding this comment

tlrmchlsmth Dec 31, 2024

Choose a reason for hiding this comment

youkaichao commented Dec 30, 2024 •

edited by github-actions bot

Loading