Skip to content

[release/2.11][wsl] huggingface TrOCRForCausalLM and XGLMForCausalLM pass but has RuntimeError: value cannot be converted to type float without overflow #2953

@bjarzemb

Description

@bjarzemb

🐛 Describe the bug

PyTorch 2.10:

WARNING:common:fp64 golden ref were not generated for XGLMForCausalLM. Setting accuracy check to cosine
pass

PyTorch 2.11:

WARNING:common:fp64 golden ref were not generated for XGLMForCausalLM. Setting accuracy check to cosine
Traceback (most recent call last):
  File "/pytorch/benchmarks/dynamo/common.py", line 2238, in check_accuracy
    fp64_outputs = self.run_n_iterations(
                   ^^^^^^^^^^^^^^^^^^^^^^
  File "/pytorch/benchmarks/dynamo/common.py", line 2073, in run_n_iterations
    model_iter_fn(mod, inputs, collect_outputs=False)
  File "/pytorch/benchmarks/dynamo/huggingface.py", line 556, in forward_pass
    res = mod(**inputs)
          ^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/transformers/models/xglm/modeling_xglm.py", line 668, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/transformers/models/xglm/modeling_xglm.py", line 512, in forward
    attention_mask = _prepare_4d_causal_attention_mask(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py", line 353, in _prepare_4d_causal_attention_mask
    attention_mask = attn_mask_converter.to_causal_4d(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py", line 95, in to_causal_4d
    causal_4d_mask = self._make_causal_mask(
                     ^^^^^^^^^^^^^^^^^^^^^^^
  File "/conda_env_path/lib/python3.12/site-packages/transformers/modeling_attn_mask_utils.py", line 164, in _make_causal_mask
    mask = torch.full((tgt_len, tgt_len), torch.finfo(dtype).min, device=device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: value cannot be converted to type float without overflow
pass

Versions

PyTorch version: 2.11.0+xpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.1 LTS (x86_64)
GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Clang version: Could not collect
CMake version: version 4.2.1
Libc version: glibc-2.39

Python version: 3.12.12 | packaged by conda-forge | (main, Jan 26 2026, 23:51:32) [GCC 14.3.0] (64-bit runtime)
Python platform: Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.39
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: True
XPU used to build PyTorch: 20250302
Intel GPU driver version:

libze1: 1.26.2-124.04ppa1
intel-opencl-icd: 26.01.36711.4-124.04ppa1
Intel GPU models onboard:
N/A
Intel GPU models detected:
[0] _XpuDeviceProperties(name='Intel(R) Graphics [0xe20b]', platform_name='Intel(R) oneAPI Unified Runtime over Level-Zero V2', type='gpu', device_id=0xE20B, uuid=86800be2-0000-0000-0400-000000000000, driver_version='1.14.36711+4', total_memory=11869MB, local_mem_size=128KB, max_compute_units=160, gpu_eu_count=160, gpu_subslice_count=20, max_work_group_size=1024, max_num_sub_groups=64, sub_group_sizes=[16 32], has_fp16=1, has_fp64=1, has_atomic64=1)
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 20
On-line CPU(s) list: 0-19
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) Ultra 7 265K
CPU family: 6
Model: 198
Thread(s) per core: 1
Core(s) per socket: 20
Socket(s): 1
Stepping: 2
BogoMIPS: 7756.81
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves avx_vnni vnmi umip waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize flush_l1d arch_capabilities
Virtualization: VT-x
Hypervisor vendor: Microsoft
Virtualization type: full
L1d cache: 960 KiB (20 instances)
L1i cache: 1.3 MiB (20 instances)
L2 cache: 60 MiB (20 instances)
L3 cache: 30 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-19
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Mitigation; Enhanced IBRS
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] dpcpp-cpp-rt==2025.3.2
[pip3] impi-rt==2021.17.2
[pip3] intel-cmplr-lib-rt==2025.3.2
[pip3] intel-cmplr-lib-ur==2025.3.2
[pip3] intel-cmplr-lic-rt==2025.3.2
[pip3] intel-opencl-rt==2025.3.2
[pip3] intel-openmp==2025.3.2
[pip3] intel-pti==0.16.0
[pip3] intel-sycl-rt==2025.3.2
[pip3] mkl==2025.3.1
[pip3] numpy==2.4.2
[pip3] oneccl==2021.17.2
[pip3] oneccl-devel==2021.17.2
[pip3] onemkl-license==2025.3.1
[pip3] onemkl-sycl-blas==2025.3.1
[pip3] onemkl-sycl-dft==2025.3.1
[pip3] onemkl-sycl-lapack==2025.3.1
[pip3] onemkl-sycl-rng==2025.3.1
[pip3] onemkl-sycl-sparse==2025.3.1
[pip3] optree==0.18.0
[pip3] tbb==2022.3.1
[pip3] tcmlib==1.4.1
[pip3] torch==2.11.0+xpu
[pip3] torchaudio==2.11.0+xpu
[pip3] torchvision==0.26.0+xpu
[pip3] triton-xpu==3.7.0
[pip3] umf==1.0.3
[conda] dpcpp-cpp-rt 2025.3.2 pypi_0 pypi
[conda] impi-rt 2021.17.2 pypi_0 pypi
[conda] intel-cmplr-lib-rt 2025.3.2 pypi_0 pypi
[conda] intel-cmplr-lib-ur 2025.3.2 pypi_0 pypi
[conda] intel-cmplr-lic-rt 2025.3.2 pypi_0 pypi
[conda] intel-opencl-rt 2025.3.2 pypi_0 pypi
[conda] intel-openmp 2025.3.2 pypi_0 pypi
[conda] intel-pti 0.16.0 pypi_0 pypi
[conda] intel-sycl-rt 2025.3.2 pypi_0 pypi
[conda] mkl 2025.3.1 pypi_0 pypi
[conda] numpy 2.4.2 pypi_0 pypi
[conda] oneccl 2021.17.2 pypi_0 pypi
[conda] oneccl-devel 2021.17.2 pypi_0 pypi
[conda] onemkl-license 2025.3.1 pypi_0 pypi
[conda] onemkl-sycl-blas 2025.3.1 pypi_0 pypi
[conda] onemkl-sycl-dft 2025.3.1 pypi_0 pypi
[conda] onemkl-sycl-lapack 2025.3.1 pypi_0 pypi
[conda] onemkl-sycl-rng 2025.3.1 pypi_0 pypi
[conda] onemkl-sycl-sparse 2025.3.1 pypi_0 pypi
[conda] optree 0.18.0 pypi_0 pypi
[conda] tbb 2022.3.1 pypi_0 pypi
[conda] tcmlib 1.4.1 pypi_0 pypi
[conda] torch 2.11.0+xpu pypi_0 pypi
[conda] torchaudio 2.11.0+xpu pypi_0 pypi
[conda] torchvision 0.26.0+xpu pypi_0 pypi
[conda] triton-xpu 3.7.0 pypi_0 pypi
[conda] umf 1.0.3 pypi_0 pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions