-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[torch.compile] consider relevant code in compilation cache #11614
base: main
Are you sure you want to change the base?
[torch.compile] consider relevant code in compilation cache #11614
Conversation
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
Signed-off-by: youkaichao <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: youkaichao <[email protected]>
this line is relevant because we use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool
def __init__(self, tensors): | ||
# manually define this function, so that | ||
# Dynamo knows `IntermediateTensors()` comes from this file. | ||
# Otherwise, dataclass will generate this function by evaluating | ||
# a string, and we will lose the information about the source file. | ||
self.tensors = tensors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something that we'll need to do for every dataclass
that's used during model execution?
hash_key = hashlib.md5( | ||
f"{config_hash}_{code_hash}".encode()).hexdigest()[:10] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not take the whole hash?
self.vllm_config.compilation_config.traced_files.add( | ||
self.original_code_object.co_filename) | ||
inline_call = InliningInstructionTranslator.inline_call | ||
|
||
def patched_inline_call(parent, func, args, kwargs): | ||
code = func.get_code() | ||
self.vllm_config.compilation_config.traced_files.add( | ||
code.co_filename) | ||
return inline_call(parent, func, args, kwargs) | ||
|
||
with patch.object(InliningInstructionTranslator, 'inline_call', | ||
patched_inline_call): | ||
output = self.compiled_callable(*args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment explaining how and why we are adding the file names here?
example output:
$ vllm serve meta-llama/Meta-Llama-3-8B -O3 ... DEBUG 12-29 21:04:52 backends.py:495] Traced files (to be considered for compilation cache): DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/uv_envs/py310/lib/python3.10/site-packages/torch/nn/modules/container.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/attention/layer.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/distributed/communication_op.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/distributed/parallel_state.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/custom_op.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/activation.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/layernorm.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/linear.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/rotary_embedding.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/layers/vocab_parallel_embedding.py DEBUG 12-29 21:04:52 backends.py:495] /data/youkaichao/vllm/vllm/model_executor/models/llama.py ...
Looks pretty good.