Skip to content

Daisyden/artifacts4 #1672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Daisyden/artifacts4 #1672

wants to merge 7 commits into from

Conversation

daisyden
Copy link
Contributor

No description provided.

@daisyden
Copy link
Contributor Author

Triage bot UT analaysis result for reference only, please note unique error message only report once:

  1. third_party.torch-xpu-ops.test.xpu.test_foreach_xpu.TestForeachXPU . test_parity__foreach_div_fastpath_outplace_xpu_complex128 got failed with error message
 Traceback (most recent call last): ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1135, in test_wrapper ; return test(*args, **kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/unittest/mock.py", line 1833, in _inner ; return f(*args, **kw) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1962, in wrap_fn ; return fn(self, *args, **kwargs) ; File "/home/sdp/actions-runner-1/_work/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_foreach.py", line 241, in test_parity ; self.assertEqual(expected, actual) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4089, in assertEqual ; raise error_metas.pop()[0].to_error(  # type: ignore[index] ; AssertionError: Tensor-likes are not close! 

triage bot result:

{
  "similar_issue_id": "N/A",
  "similar_issue_state": "N/A",
  "issue_owner": "PenghuiCheng",
  "issue_description": "The unit test `test_parity__foreach_div_fastpath_outplace_xpu_complex128` is failing with an assertion error indicating that the expected and actual tensor values are not close. This suggests an issue with the foreach division operation for complex128 tensors on XPU.",
  "root_causes": [
    "Potential bug in the foreach implementation for complex division on XPU.",
    "Numerical precision issues specific to complex128 tensors on XPU.",
    "Incorrect handling of complex numbers in the XPU backend's division operation."
  ],
  "suggested_solutions": [
    "Review and verify the foreach implementation for complex division on XPU to ensure correctness.",
    "Check for any recent changes in the codebase that might have affected the XPU backend or foreach functions.",
    "Compare the results with CPU or GPU implementations to identify discrepancies in handling complex128 tensors."
  ]
}

@daisyden
Copy link
Contributor Author

Triage bot UT analaysis result for reference only, please note unique error message only report once:

  1. third_party.torch-xpu-ops.test.xpu.test_foreach_xpu.TestForeachXPU test_parity__foreach_div_fastpath_outplace_xpu_complex128 got failed with error message
 Traceback (most recent call last): ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1135, in test_wrapper ; return test(*args, **kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/unittest/mock.py", line 1833, in _inner ; return f(*args, **kw) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 1962, in wrap_fn ; return fn(self, *args, **kwargs) ; File "/home/sdp/actions-runner-1/_work/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_foreach.py", line 241, in test_parity ; self.assertEqual(expected, actual) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4089, in assertEqual ; raise error_metas.pop()[0].to_error(  # type: ignore[index] ; AssertionError: Tensor-likes are not close! 

triage bot result:

{
  "similar_issue_id": "N/A",
  "similar_issue_state": "N/A",
  "issue_owner": "PenghuiCheng",
  "issue_description": "The unit test `test_parity__foreach_div_fastpath_outplace_xpu_complex128` is failing with an assertion error indicating that the expected and actual tensor values are not close. The failure occurs during the `test_parity` method in `test_foreach.py`, specifically when comparing the results of a foreach division operation on XPU using complex128 tensors.",
  "root_causes": [
    "Potential implementation issues in the foreach division kernel for complex128 tensors on XPU.",
    "Numerical precision differences between CPU and XPU that affect complex number operations.",
    "Incorrect handling of complex numbers in the foreach operations leading to discrepancies in results."
  ],
  "suggested_solutions": [
    "Review and verify the implementation of foreach division operations for complex128 tensors on XPU to ensure correctness.",
    "Investigate and address any numerical precision issues specific to XPU when handling complex numbers.",
    "Adjust the tolerance levels in the test if the discrepancy is due to expected precision limitations of XPU hardware."
  ]
}
  1. third_party.torch-xpu-ops.test.xpu.test_ops_xpu.TestCommonXPU test_errors_dot_xpu got failed with error message
 Traceback (most recent call last): ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1135, in test_wrapper ; return test(*args, **kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1215, in dep_fn ; return fn(slf, *args, **kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1451, in only_fn ; return fn(self, *args, **kwargs) ; File "/home/sdp/actions-runner-1/_work/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_ops.py", line 656, in test_errors ; with self.assertRaisesRegex(ei.error_type, ei.error_regex): ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/unittest/case.py", line 226, in __exit__ ; self._raiseFailure("{} not raised".format(exc_name)) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/unittest/case.py", line 163, in _raiseFailure ; raise self.test_case.failureException(msg) ; AssertionError: RuntimeError not raised 

triage bot result:

{
  "similar_issue_id": "N/A",
  "similar_issue_state": "N/A",
  "issue_owner": "PenghuiCheng",
  "issue_description": "The unit test `test_errors_dot_xpu` in `TestCommonXPU` is failing because a `RuntimeError` is not being raised as expected. The test expects a specific error to occur, but it does not, leading to a test failure.",
  "root_causes": [
    "The code under test may no longer trigger the expected `RuntimeError` under the test's conditions.",
    "The error might be caught or handled elsewhere, preventing it from reaching the test.",
    "The test's conditions or inputs may no longer correctly trigger the error.",
    "There may be a recent code change affecting error handling or conditions."
  ],
  "suggested_solutions": [
    "Review the code under test to ensure the expected error is raised under the test's conditions.",
    "Check if the test's setup and inputs are still valid for triggering the error.",
    "Investigate if error handling has changed, preventing the exception from being raised.",
    "Update the test conditions if necessary to reflect current behavior."
  ]
}
  1. third_party.torch-xpu-ops.test.xpu.test_ops_xpu.TestCommonXPU test_noncontiguous_samples_nn_functional_conv3d_xpu_int64 got failed with error message
 Traceback (most recent call last): ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1135, in test_wrapper ; return test(*args, **kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_cuda.py", line 257, in wrapped ; return f(*args, **kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_device_type.py", line 1451, in only_fn ; return fn(self, *args, **kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 2278, in wrapper ; fn(*args, **kwargs) ; File "/home/sdp/actions-runner-1/_work/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_ops.py", line 730, in test_noncontiguous_samples ; expected = op(t_inp, *t_args, **t_kwargs) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/opinfo/core.py", line 1188, in __call__ ; return self.op(*args, **kwargs) ; RuntimeError: Long is not supported in oneDNN! 

triage bot result:

{
  "similar_issue_id": 645,
  "similar_issue_state": "closed",
  "issue_owner": "PenghuiCheng",
  "issue_description": "Unit test third_party.torch-xpu-ops.test.xpu.test_ops_xpu.TestCommonXPU.test_noncontiguous_samples_nn_functional_conv3d_xpu_int64 failed with error: RuntimeError: Long is not supported in oneDNN!",
  "root_causes": [
    "The test failure is due to the unsupported Long (int64) type in oneDNN operations during XPU testing.",
    "Discrepancies in handling data types across different platforms and operations in PyTorch XPU."
  ],
  "suggested_solutions": [
    "Investigate and adjust data type handling in XPU operations to ensure compatibility with oneDNN's supported types.",
    "Review and update the test cases to handle Long type appropriately in oneDNN operations."
  ]
}
  1. third_party.torch-xpu-ops.test.xpu.test_transformers_xpu.TestTransformersXPU test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_xpu got failed with error message
 Traceback (most recent call last): ; File "/home/sdp/actions-runner-1/_work/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/xpu/../../../../test/test_transformers.py", line 379, in test_multiheadattention_fastpath_attn_mask ; self.assertEqual(out, out_fp.nan_to_num()) ; File "/home/sdp/miniforge3/envs/xpu_op_1/lib/python3.10/site-packages/torch/testing/_internal/common_utils.py", line 4089, in assertEqual ; raise error_metas.pop()[0].to_error(  # type: ignore[index] ; AssertionError: Tensor-likes are not close! 

triage bot result:

{
  "similar_issue_id": 1214,
  "similar_issue_state": "open",
  "issue_owner": "daisyden",
  "issue_description": "Unit test `test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_xpu` failed with `AssertionError: Tensor-likes are not close!`. The test compares the output tensor `out` with `out_fp.nan_to_num()`, indicating a numerical discrepancy. This issue may be related to floating-point precision differences, implementation discrepancies, or incorrect handling of NaNs/Infs on XPU.",
  "root_causes": [
    "Floating point precision differences between XPU and other devices.",
    "Implementation differences in multihead attention on XPU.",
    "Incorrect handling of NaNs or Infs in the XPU implementation."
  ],
  "suggested_solutions": [
    "Increase numerical tolerance in tensor comparisons if acceptable.",
    "Review and align XPU multihead attention implementation with CPU.",
    "Investigate and correct NaN/Inf handling in the XPU code."
  ]
}

@daisyden
Copy link
Contributor Author

Triage bot UT analaysis result for reference only, please note unique error message only report once:

  1. third_party.torch-xpu-ops.test.xpu.test_foreach_xpu.TestForeachXPU test_parity__foreach_div_fastpath_outplace_xpu_complex128 got failed with error message
AssertionError: Tensor-likes are not close! 

triage bot result:

{
  "similar_issue_id": 1214,
  "similar_issue_state": "open",
  "issue_owner": "daisyden",
  "issue_description": "In preci test, there are random cases will fail with 'AssertionError: Tensor-likes are not close!'. The failing test cases include: test_python_ref__refs_exp_xpu_complex128, test_python_ref__refs_sigmoid_xpu_complex128, test_python_ref_executor__refs_log2_executor_aten_xpu_complex128, test_python_ref_executor__refs_exp_executor_aten_xpu_complex128, test_python_ref_torch_fallback__refs_log2_xpu_complex128, test_python_ref_torch_fallback__refs_log10_xpu_complex128, test_python_ref_torch_fallback__refs_sigmoid_xpu_complex128. A workaround PR is provided: https://github.com/intel/torch-xpu-ops/pull/1211. Additional random failures to be added to skiplist: TestCommonXPU.test_python_ref_executor__refs_sigmoid_executor_aten_xpu_complex128, TestCommonXPU.test_compare_cpu_nn_functional_local_response_norm_xpu_bfloat16, test_ops_xpu.py::TestCommonXPU::test_python_ref__refs_log10_xpu_complex128.\ntest_foreach_xpu.py::TestForeachXPU::test_parity__foreach_div_fastpath_outplace_xpu_complex128 failed in release/2.7 RC2 pre release wheel",
  "root_causes": "The issue involves tensor comparison failures, potentially due to precision issues or implementation differences on XPU devices.",
  "suggested_solutions": "Implement the workaround provided in PR #1211 to address the tensor comparison discrepancies. Adjust computations on XPU to align results with expected values."
}
  1. third_party.torch-xpu-ops.test.xpu.test_ops_xpu.TestCommonXPU test_noncontiguous_samples_nn_functional_conv3d_xpu_int64 got failed with error message
RuntimeError: Long is not supported in oneDNN! 

triage bot result:

{
  "similar_issue_id": "N/A",
  "similar_issue_state": "N/A",
  "issue_owner": "daisyden",
  "issue_description": "Unit test third_party.torch-xpu-ops.test.xpu.test_ops_xpu.TestCommonXPU.test_noncontiguous_samples_nn_functional_conv3d_xpu_int64 failed with RuntimeError: Long is not supported in oneDNN!",
  "root_causes": [
    "Mismatch in data type handling where Long tensors are passed to oneDNN, which doesn't support them.",
    "Potential missing conversion or handling step for Long tensors in XPU operations."
  ],
  "suggested_solutions": [
    "Ensure that data types are correctly converted before processing in oneDNN, possibly converting Long tensors to a supported type.",
    "Enhance the XPU backend to handle Long tensors appropriately in NN functional operations."
  ]
}
  1. third_party.torch-xpu-ops.test.xpu.test_transformers_xpu.TestTransformersXPU test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_xpu got failed with error message
 AssertionError: Tensor-likes are not close! 

triage bot result:

{
  "similar_issue_id": 1214,
  "similar_issue_state": "open",
  "issue_owner": "daisyden",
  "issue_description": "In preci test, there are random cases will fail with 'AssertionError: Tensor-likes are not close!'. The failing test cases include: test_python_ref__refs_exp_xpu_complex128, test_python_ref__refs_sigmoid_xpu_complex128, test_python_ref_executor__refs_log2_executor_aten_xpu_complex128, test_python_ref_executor__refs_exp_executor_aten_xpu_complex128, test_python_ref_torch_fallback__refs_log2_xpu_complex128, test_python_ref_torch_fallback__refs_log10_xpu_complex128, test_python_ref_torch_fallback__refs_sigmoid_xpu_complex128. A workaround PR is provided: https://github.com/intel/torch-xpu-ops/pull/1211. Additional random failures to be added to skiplist: TestCommonXPU.test_python_ref_executor__refs_sigmoid_executor_aten_xpu_complex128, TestCommonXPU.test_compare_cpu_nn_functional_local_response_norm_xpu_bfloat16, test_ops_xpu.py::TestCommonXPU::test_python_ref__refs_log10_xpu_complex128.\ntest_foreach_xpu.py::TestForeachXPU::test_parity__foreach_div_fastpath_outplace_xpu_complex128 failed in release/2.7 RC2 pre release wheel",
  "root_causes": [
    "Numerical precision issues in XPU operations leading to tensor value mismatches.",
    "Possible implementation discrepancies between CPU and XPU for attention mask computations.",
    "Random test failures attributed to hardware-specific behavior or numerical instability."
  ],
  "suggested_solutions": [
    "Investigate the specific computation of attention masks on XPU and ensure consistency with CPU behavior.",
    "Review and adjust numerical precision settings or tolerance levels in the tests.",
    "Implement a workaround as suggested in the provided PR, possibly adjusting test expectations or skipping known problematic cases.",
    "Add the failing test to the skip list if it's determined to be a false positive or beyond the scope of immediate fixes."
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants