Update NVFP4 default observer (#493)

dsikka · web-flow · commit f52549956898 · 2025-10-15T13:36:22.000-04:00
* Update NVFP4 default observer

- As opposed to use the running minmax, use the static minmax.
- Seems to be better for Qwen3 VL MoE
- Need to rerun on Llama3 8b

* Update quant_scheme.py
diff --git a/src/compressed_tensors/quantization/quant_scheme.py b/src/compressed_tensors/quantization/quant_scheme.py
@@ -172,6 +172,7 @@ def is_preset_scheme(name: str) -> bool:
         symmetric=True,
         dynamic=False,
         group_size=16,
+        observer="static_minmax",
     ),
     input_activations=QuantizationArgs(
         num_bits=4,
@@ -180,6 +181,7 @@ def is_preset_scheme(name: str) -> bool:
         symmetric=True,
         dynamic=DynamicType.LOCAL,
         group_size=16,
+        observer="static_minmax",
     ),
 )