Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix]Add int8 cache dtype when using ascend attention quantization #125

Closed
wants to merge 40 commits into from

Conversation

Angazenn
Copy link

Ascend attention requires int8 kvcache dtype. It is used in initialization of CacheConfig:
if cache_config.cache_dtype == "auto": self.dtype = model_config.dtype else: self.dtype = STR_DTYPE_TO_TORCH_DTYPE[cache_config.cache_dtype]
STR_DTYPE_TO_TORCH_DTYPE is defined in vllm.utils:
STR_DTYPE_TO_TORCH_DTYPE = { "half": torch.half, "bfloat16": torch.bfloat16, "float": torch.float, "fp8": torch.uint8, "fp8_e4m3": torch.uint8, "fp8_e5m2": torch.uint8, }
Hence we need to update both cache_dtype and STR_DTYPE_TO_TORCH_DTYPE.

hw_whx and others added 7 commits February 7, 2025 12:40
…ject#20)

This PR tries to register mindie_turbo while initializing NPUWorker. The
register function is added into a new file named utils.py

---------

Signed-off-by: hw_whx <[email protected]>
Co-authored-by: hw_whx <[email protected]>
Signed-off-by: hw_whx <[email protected]>
[Hardware][Ascend] Add silu_and_mul/rope; Add mix ops into attention layer
This pr adds ascend quantization interface to vllm-ascend, including
AscendQuantConfig class which inherits from vllm QuantizationConfig
class, AscendLinearMethod class which inherits from vllm
LinearMethodBase class, AscendQuantizer class that dispatches
corresponding quanzation methods.

---------

Signed-off-by: angazenn <[email protected]>
Co-authored-by: angazenn <[email protected]>
@Angazenn Angazenn changed the base branch from v0.7.1-dev to main February 21, 2025 01:45
@Angazenn Angazenn changed the base branch from main to v0.7.1-dev February 21, 2025 01:45
angazenn and others added 21 commits February 21, 2025 09:46
Signed-off-by: angazenn <[email protected]>
Signed-off-by: angazenn <[email protected]>
Signed-off-by: angazenn <[email protected]>
Some PR for plugin support is not merged by vllm yet. This PR add monkey
patch to vllm-ascend to make vllm-ascend work with vllm directly.

This patch code should be removed once the related function is supported
by vllm originally.

cherry pick to 0.7.1

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: angazenn <[email protected]>
fix packages for finding submodule.
see vllm-project#42

Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: angazenn <[email protected]>
### What this PR does / why we need it?
This PR updates the dependency version of vllm-ascend on torch-npu, so
that the vllm-ascend can be installed in a later version environment
(like to torch-npu 2.6.0rc1),
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI test

Signed-off-by: ji-huazhong <[email protected]>
Signed-off-by: angazenn <[email protected]>
Fix communicator patch for distributed inferencing.
We should patch `GroupCoordinator` with its module, and just before
initializing distributed env. So that the patch won't be shadowed by the
import of `init_distributed_environment` in `worker.py`

Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: angazenn <[email protected]>
vllm-project#54)

### What this PR does / why we need it?
In open-r1, the rank 0 process will create an LLM instance and load the model to `npu:7`. We need to force the output tensor to be created on the same device as the query tensor.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Test by main branch

Signed-off-by: angazenn <[email protected]>
…ize to 128 in platform.py

Signed-off-by: hw_whx <[email protected]>
Signed-off-by: angazenn <[email protected]>
Signed-off-by: hw_whx <[email protected]>
Signed-off-by: angazenn <[email protected]>
### What this PR does / why we need it?
Backport main docs to make CI happy:
```
cp -r ../vllm-ascend-main/docs ./
cp ../vllm-ascend-main/README* ./
cp ../vllm-ascend-main/.readthedocs.yaml ./
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI passed
```
cp -r ../vllm-ascend-main/docs ./
cp ../vllm-ascend-main/README* ./
cp ../vllm-ascend-main/.readthedocs.yaml ./
```

no diff

Signed-off-by: Yikun Jiang <[email protected]>
Signed-off-by: angazenn <[email protected]>
### What this PR does / why we need it?

Backport vllm-project#64 to
v0.7.1-dev branch

Add container image build ci:
- Enable branch, tag docker image publish
    - branch image: `vllm-ascend:main`, `vllm-ascend:v0.7.1-dev`
    - tag image: `vllm-ascend:v0.7.1rc1`
- Enable PR docker image build check
- other changes:
    - Prepare the `REPO_OWNER` because the ghcr lowerercase required
- Add `Free up disk space` step to avoid `No space left on device` like
vllm-project#27
- Setup qemu with image to resolve
docker/setup-qemu-action#198

### Does this PR introduce _any_ user-facing change?
NO

### How was this patch tested?
build: CI passed

---------

Signed-off-by: Yikun Jiang <[email protected]>
Signed-off-by: angazenn <[email protected]>
### What this PR does / why we need it?
Add vllm-ascend tutorials for v0.7.1.

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
no.

Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: angazenn <[email protected]>
### What this PR does / why we need it?
Refeactor installation doc

backport vllm-project#80

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI, preview

Signed-off-by: Yikun Jiang <[email protected]>
Signed-off-by: angazenn <[email protected]>
This PR add attention quantization interfaces, including
AscendQKVQuantAttentionMethod class inherited from BaseKVCacheMethod
class.

---------

Signed-off-by: angazenn <[email protected]>
Co-authored-by: angazenn <[email protected]>
Signed-off-by: angazenn <[email protected]>
### What this PR does / why we need it?
Update tutorials.
Backport vllm-project#79

### Does this PR introduce _any_ user-facing change?
no.

### How was this patch tested?
ci.

Signed-off-by: Shanshan Shen <[email protected]>
Signed-off-by: Yikun Jiang <[email protected]>
Co-authored-by: Shanshan Shen <[email protected]>
Signed-off-by: angazenn <[email protected]>
cherry-pick from vllm-project#59

Signed-off-by: wangxiyuan <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Signed-off-by: angazenn <[email protected]>
### What this PR does / why we need it?
- Set default model to Qwen2.5-0.5B-Instruct in example
- Remove Ultravox 0.3 because it is not tested currently

Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: angazenn <[email protected]>
Add npu implement for FusedMoE

Signed-off-by: YHT <[email protected]>
Co-authored-by: YHT <[email protected]>
Signed-off-by: angazenn <[email protected]>
<!--  Thanks for sending a pull request!

BEFORE SUBMITTING, PLEASE READ
https://docs.vllm.ai/en/latest/contributing/overview.html

-->
### What this PR does / why we need it?
<!--
- Please clarify what changes you are proposing. The purpose of this
section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster
reviews in your PR.

- Please clarify why the changes are needed. For instance, the use case
and bug description.

- Fixes #
-->
To adapt to the MLA structure of vLLM DeepSeek on Ascend hardware, write
the AscendMLAAttentionBackendImpl class.

### Does this PR introduce _any_ user-facing change?
<!--
Note that it means *any* user-facing change including all aspects such
as API, interface or other behavior changes.
Documentation-only updates are not considered user-facing changes.
-->
Users can choose to set VLLM_MLA_DISABLE to 1 or 0 to disable or enable
MLA.

### How was this patch tested?
<!--
CI passed with new added/existing test.
If it was tested in a way different from regular unit tests, please
clarify how you tested step by step, ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future.
If tests were not added, please describe why they were not added and/or
why it was difficult to add.
-->

Signed-off-by: YHT <[email protected]>
Co-authored-by: YHT <[email protected]>
Signed-off-by: angazenn <[email protected]>
wangxiyuan and others added 12 commits February 21, 2025 09:46
cherry-pick from vllm-project#90

Add dynamic version in docs

Signed-off-by: Yikun Jiang <[email protected]>
Co-authored-by: Yikun Jiang <[email protected]>
Signed-off-by: angazenn <[email protected]>
)

1. Update CANN image name
2. Add pta install step
3. update vllm-ascend docker image name to ghcr
4. update quick_start to use vllm-ascend image directly.
5. fix `note` style

cherry-pick from vllm-project#85

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: angazenn <[email protected]>
…blems. (vllm-project#95)

fix an accuracy problem caused by missing of value contiguous

Signed-off-by: hw_whx <[email protected]>
Co-authored-by: hw_whx <[email protected]>
Signed-off-by: angazenn <[email protected]>
Final update for 0.7.1.rc1 release.
1. Update the version in doc
2. Update dockerfile

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: angazenn <[email protected]>
Update feature and model lists

Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: angazenn <[email protected]>
change docker registry to quay

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: angazenn <[email protected]>
update feature support plan

Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: angazenn <[email protected]>
Fix a bug caused by omitting to change the parameter name.

Signed-off-by: YHT <[email protected]>
Co-authored-by: YHT <[email protected]>
Signed-off-by: angazenn <[email protected]>
Don't login docker registry in pull request.

Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: angazenn <[email protected]>
update model list

Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: angazenn <[email protected]>
Signed-off-by: angazenn <[email protected]>
@Angazenn Angazenn changed the base branch from v0.7.1-dev to main February 21, 2025 01:49
@Angazenn Angazenn closed this Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants