Skip to content

Commit 08dc23e

Browse files
authored
fix(transformers): fix mindspore import bug in examples/qwen2_vl (#1368)
* fix(transformers): fix mindspore import bug in examples/qwen2_vl * fix(transformers): update notice of examples/transformers/qwen2_vl
1 parent b17f5c5 commit 08dc23e

File tree

1 file changed

+13
-6
lines changed

1 file changed

+13
-6
lines changed

examples/transformers/qwen2_vl/README.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,12 @@ Pretrained weights from huggingface hub: [Qwen2-VL-7B-Instruct](https://huggingf
2828
`vqa_test.py` and `video_understanding.py` provides examples of image and video VQA. Here is an usage example of image understanding:
2929

3030
```python
31+
import mindspore
3132
from transformers import AutoProcessor
3233
from mindone.transformers import Qwen2VLForConditionalGeneration
3334
from mindone.transformers.models.qwen2_vl.qwen_vl_utils import process_vision_info
34-
from mindspore import Tensor
35-
import numpy as np
3635

37-
model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen2/Qwen2-VL-7B-Instruct", mindspore_dtype=ms.float32)
36+
model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen2/Qwen2-VL-7B-Instruct", mindspore_dtype=mindspore.float32)
3837
processor = AutoProcessor.from_pretrained("Qwen2/Qwen2-VL-7B-Instruct")
3938

4039
messages = [
@@ -63,9 +62,9 @@ inputs = processor(
6362
)
6463
# convert input to Tensor
6564
for key, value in inputs.items():
66-
inputs[key] = ms.Tensor(value)
67-
if inputs[key].dtype == ms.int64:
68-
inputs[key] = inputs[key].to(ms.int32)
65+
inputs[key] = mindspore.Tensor(value)
66+
if inputs[key].dtype == mindspore.int64:
67+
inputs[key] = inputs[key].to(mindspore.int32)
6968
generated_ids = model.generate(**inputs, max_new_tokens=128)
7069
output_text = processor.batch_decode(
7170
generated_ids,
@@ -75,6 +74,14 @@ output_text = processor.batch_decode(
7574
print(output_text)
7675
```
7776

77+
## **Notice**
78+
When setting fp32 on 910B4(32GB) machine, inference process may raise OOM error. Becausem the theoretical memory consumption(model weights+activations+memory fragments) may reach to maximum memory on 910B4 machine.
79+
In this case, some methods could be tried to reduce NPU memory:
80+
- Method 1. set mindspore_dtype to ms.bfloat16 or ms.float16(model = Qwen2VLForConditionalGeneration.from_pretrained("Qwen2/Qwen2-VL-7B-Instruct", mindspore_dtype=mindspore.bfloat16)). The theoretical memory consumption would be reduced to 14GB.
81+
- Method 2. Reduce image size
82+
- Method 3. change the machine to 910B1/910B2/910B3.
83+
84+
7885
# Performance
7986
## Inference
8087

0 commit comments

Comments
 (0)