[Bug]: v0.8.0 Causes OOM and Performance Degradation for Deepseek R1

### ⚙️ Your current environment

<details>
<summary>The output of <code>python collect_env.py</code></summary>

```text
### Environment Information ###
Operating System: `Linux-5.4.250-4-velinux1u1-amd64-x86_64-with-glibc2.35`
Python Version: `3.11.13 (main, Jun  5 2025, 13:12:00) [GCC 11.2.0]`
llm-compressor Version: `0.7.1`
compressed-tensors Version: `0.11.0`
transformers Version: `4.55.2`
torch Version: `2.8.0`
CUDA Devices: `['NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20', 'NVIDIA H20']`
AMD Devices: `None`
```

</details>


### 🐛 Describe the bug

When executing `examples/quantizing_moe/deepseek_r1_example.py`, an OOM (Out of Memory) error occurs at step 235 (model.layers.3.mlp.experts.226.down_proj), with the process taking approximately 1 hour. However, when using version v0.7.1, the memory usage is normal, and it only takes 25 minutes to reach step 235.

<img width="1032" height="588" alt="Image" src="https://github.com/user-attachments/assets/5fb8f7dc-8e7b-4439-a2d8-5d7c425d254c" />


### 🛠️ Steps to reproduce

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: v0.8.0 Causes OOM and Performance Degradation for Deepseek R1 #1936

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: v0.8.0 Causes OOM and Performance Degradation for Deepseek R1 #1936

Description

⚙️ Your current environment

🐛 Describe the bug

🛠️ Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions