Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Model Runner][Performance] Cache the jugement result of is_encoder_d…
…ecoder to decrease framework overhead (#138) In Model Runner, is_encoder_decoder is exacted from model_config to determin whether vllm is running for enc-dec models. Obtaining this status requires a long call stack, and the CPU overhead is high. So this PR cache this status in __init__ of ModelInputForNPUBuilder. Signed-off-by: hw_whx <[email protected]> Co-authored-by: hw_whx <[email protected]>
- Loading branch information