-
Notifications
You must be signed in to change notification settings - Fork 264
Open
Labels
bugSomething isn't workingSomething isn't working
Description
โ๏ธ Your current environment
The output of python collect_env.py
### Environment Information ###
Operating System: `Linux-5.4.250-4-velinux1u1-amd64-x86_64-with-glibc2.35`
Python Version: `3.11.13 (main, Jun 5 2025, 13:12:00) [GCC 11.2.0]`
llm-compressor Version: `0.7.1`
compressed-tensors Version: `0.11.0`
transformers Version: `4.55.2`
torch Version: `2.8.0`
CUDA Devices: `['NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20']`
AMD Devices: `None`
๐ Describe the bug
When executing examples/quantizing_moe/deepseek_r1_example.py
, an OOM (Out of Memory) error occurs at step 235 (model.layers.3.mlp.experts.226.down_proj), with the process taking approximately 1 hour. However, when using version v0.7.1, the memory usage is normal, and it only takes 25 minutes to reach step 235.

๐ ๏ธ Steps to reproduce
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working