Skip to content

Set warn_only for unsupported deterministic algorithms#1145

Open
Tcc0403 wants to merge 1 commit intomainfrom
tcc/deterministic-error
Open

Set warn_only for unsupported deterministic algorithms#1145
Tcc0403 wants to merge 1 commit intomainfrom
tcc/deterministic-error

Conversation

@Tcc0403
Copy link
Collaborator

@Tcc0403 Tcc0403 commented Mar 12, 2026

Summary

Before the fix (current main branch):

❯ python3 -m pytest test/convergence/bf16/test_mini_models.py -k qwen3_5_moe
========================================================= test session starts =========================================================
platform linux -- Python 3.13.1, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/tcc/Liger-Kernel
configfile: pyproject.toml
plugins: anyio-4.12.1, rerunfailures-16.1, cov-7.0.0, asyncio-1.3.0, xdist-3.8.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 32 items / 31 deselected / 1 selected

test/convergence/bf16/test_mini_models.py::test_mini_model[mini_qwen3_5_moe-32-1e-05-dtype27-0.05-0.2-0.1-0.1-0.01-0.01] FAILED [100%]
E       RuntimeError: _histc_cuda does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True)'. You can turn off determinism just for this operation, or you can use the 'warn_only=True' option, if that's acceptable for your application. You can also file an issue at https://github.com/pytorch/pytorch/issues to help us prioritize adding deterministic support for this operation.

.venv/lib/python3.13/site-packages/transformers/integrations/moe.py:369: RuntimeError

After the fix (this PR):

❯ python3 -m pytest test/convergence/bf16/test_mini_models.py -k qwen3_5_moe
========================================================= test session starts =========================================================
platform linux -- Python 3.13.1, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/tcc/Liger-Kernel
configfile: pyproject.toml
plugins: anyio-4.12.1, rerunfailures-16.1, cov-7.0.0, asyncio-1.3.0, xdist-3.8.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 32 items / 31 deselected / 1 selected

test/convergence/bf16/test_mini_models.py::test_mini_model[mini_qwen3_5_moe-32-1e-05-dtype27-0.05-0.2-0.1-0.1-0.01-0.01] PASSED [100%]

Root cause:
torch.histc doesn't support deterministic algorithm
https://github.com/huggingface/transformers/blob/adc2f16bf1824f7b57c790b4cf3bc48f95ecec69/src/transformers/integrations/moe.py#L373

Testing Done

  • Hardware Type:
  • run make test to ensure correctness
  • run make checkstyle to ensure code style
  • run make test-convergence to ensure convergence

Signed-off-by: Tcc0403 <76503978+Tcc0403@users.noreply.github.com>
@Tcc0403 Tcc0403 changed the title Set warn_only to prevent error for unsupported algorithms Set warn_only for unsupported deterministic algorithms Mar 13, 2026
Copy link
Collaborator

@Mecoli1219 Mecoli1219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tcc0403 . Thanks for catching it. Just out of curiosity. Could you also provide the Torch & Transformers version? I couldn't reproduce the error with torch==2.8.0, transformers==5.3.0.

@Tcc0403
Copy link
Collaborator Author

Tcc0403 commented Mar 15, 2026

❯ python -m liger_kernel.env_report
Environment Report:
-------------------
Operating System: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version: 3.13.1
Liger Kernel version: 0.6.4
PyTorch version: 2.7.1+cu126
CUDA version: 12.6
HIP(ROCm) version: Not available
Triton version: 3.3.1
Transformers version: 5.3.0
XPU version: XPU Not Available

Oh I'm on torch==2.7.1, will check later versions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants