-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: vLLM got different results with PeftModelForCausalLM #1018
Comments
I solved this problem by setting dtype="float32" in LLM initialization model = LLM(
model=checkpoint,
trust_remote_code=True,
# gpu_memory_utilization=0.3,
# gpu_memory_utilization=0.1,
gpu_memory_utilization=0.15,
max_model_len=1024,
# dtype="bfloat16",
dtype="float32",
enforce_eager=True
)
sampling_params = SamplingParams(temperature=0.001,
repetition_penalty=1.1,
top_p=0.8,
top_k=20,
max_tokens=512,
stop_token_ids=[151644, 151645]
) |
I noticed that the The differences between |
No. I'v tried to set the repetition_penalty to {1.0, 1.05, 1.1}, the results are always different until I set
I noticed Finally, I choose repetition_penalty=1.1 from Qwen2.5-0.5B-Instruct/generation_config.json, this will override the default value right? |
This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread. |
May I ask you for some advice? As far as I know, vllm uses flashattention, and flashattention only supports float16 and bfloat16, not float32. In fact, when I tried to set dtype="float32", the program did report an error: "RuntimeError: FlashAttention only supports fp16 and bf16 data type". How did you implement the float32 setup? |
I'm not sure. I haven't done anything special. |
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Model Series
Qwen2.5
What are the models used?
Qwen2.5-0.5B-Instruct
What is the scenario where the problem happened?
inference with transformers, deployment with vllm/PeftModelForCausalLM, SFT with llama-factory
Is this a known issue?
Information about environment
transformers 4.45.2
vllm 0.6.2
Log output
Description
Steps to reproduce
Got results
On our testset (Slot extraction)
Expected results
The results are expected to be the same.
Attempts to fix
I have tried several ways to fix this, including:
Anything else helpful for investigation
PeftModelForCausalLM deployment
vllm deployment
The text was updated successfully, but these errors were encountered: