[BugFix]add int8 cache dtype when using attention quantization #113
image.yml
on: pull_request
vllm-ascend image
41m 57s
Artifacts
Produced during runtime
Name | Size | |
---|---|---|
vllm-project~vllm-ascend~9D5GX2.dockerbuild
|
175 KB |
|