-
Notifications
You must be signed in to change notification settings - Fork 39
Daisyden/artifacts4 #1672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Daisyden/artifacts4 #1672
Conversation
Triage bot UT analaysis result for reference only, please note unique error message only report once:
triage bot result: {
"similar_issue_id": "N/A",
"similar_issue_state": "N/A",
"issue_owner": "PenghuiCheng",
"issue_description": "The unit test `test_parity__foreach_div_fastpath_outplace_xpu_complex128` is failing with an assertion error indicating that the expected and actual tensor values are not close. This suggests an issue with the foreach division operation for complex128 tensors on XPU.",
"root_causes": [
"Potential bug in the foreach implementation for complex division on XPU.",
"Numerical precision issues specific to complex128 tensors on XPU.",
"Incorrect handling of complex numbers in the XPU backend's division operation."
],
"suggested_solutions": [
"Review and verify the foreach implementation for complex division on XPU to ensure correctness.",
"Check for any recent changes in the codebase that might have affected the XPU backend or foreach functions.",
"Compare the results with CPU or GPU implementations to identify discrepancies in handling complex128 tensors."
]
} |
Triage bot UT analaysis result for reference only, please note unique error message only report once:
triage bot result: {
"similar_issue_id": "N/A",
"similar_issue_state": "N/A",
"issue_owner": "PenghuiCheng",
"issue_description": "The unit test `test_parity__foreach_div_fastpath_outplace_xpu_complex128` is failing with an assertion error indicating that the expected and actual tensor values are not close. The failure occurs during the `test_parity` method in `test_foreach.py`, specifically when comparing the results of a foreach division operation on XPU using complex128 tensors.",
"root_causes": [
"Potential implementation issues in the foreach division kernel for complex128 tensors on XPU.",
"Numerical precision differences between CPU and XPU that affect complex number operations.",
"Incorrect handling of complex numbers in the foreach operations leading to discrepancies in results."
],
"suggested_solutions": [
"Review and verify the implementation of foreach division operations for complex128 tensors on XPU to ensure correctness.",
"Investigate and address any numerical precision issues specific to XPU when handling complex numbers.",
"Adjust the tolerance levels in the test if the discrepancy is due to expected precision limitations of XPU hardware."
]
}
triage bot result: {
"similar_issue_id": "N/A",
"similar_issue_state": "N/A",
"issue_owner": "PenghuiCheng",
"issue_description": "The unit test `test_errors_dot_xpu` in `TestCommonXPU` is failing because a `RuntimeError` is not being raised as expected. The test expects a specific error to occur, but it does not, leading to a test failure.",
"root_causes": [
"The code under test may no longer trigger the expected `RuntimeError` under the test's conditions.",
"The error might be caught or handled elsewhere, preventing it from reaching the test.",
"The test's conditions or inputs may no longer correctly trigger the error.",
"There may be a recent code change affecting error handling or conditions."
],
"suggested_solutions": [
"Review the code under test to ensure the expected error is raised under the test's conditions.",
"Check if the test's setup and inputs are still valid for triggering the error.",
"Investigate if error handling has changed, preventing the exception from being raised.",
"Update the test conditions if necessary to reflect current behavior."
]
}
triage bot result: {
"similar_issue_id": 645,
"similar_issue_state": "closed",
"issue_owner": "PenghuiCheng",
"issue_description": "Unit test third_party.torch-xpu-ops.test.xpu.test_ops_xpu.TestCommonXPU.test_noncontiguous_samples_nn_functional_conv3d_xpu_int64 failed with error: RuntimeError: Long is not supported in oneDNN!",
"root_causes": [
"The test failure is due to the unsupported Long (int64) type in oneDNN operations during XPU testing.",
"Discrepancies in handling data types across different platforms and operations in PyTorch XPU."
],
"suggested_solutions": [
"Investigate and adjust data type handling in XPU operations to ensure compatibility with oneDNN's supported types.",
"Review and update the test cases to handle Long type appropriately in oneDNN operations."
]
}
triage bot result: {
"similar_issue_id": 1214,
"similar_issue_state": "open",
"issue_owner": "daisyden",
"issue_description": "Unit test `test_multiheadattention_fastpath_attn_mask_attn_mask_dim_2_key_padding_mask_dim_2_bool_xpu` failed with `AssertionError: Tensor-likes are not close!`. The test compares the output tensor `out` with `out_fp.nan_to_num()`, indicating a numerical discrepancy. This issue may be related to floating-point precision differences, implementation discrepancies, or incorrect handling of NaNs/Infs on XPU.",
"root_causes": [
"Floating point precision differences between XPU and other devices.",
"Implementation differences in multihead attention on XPU.",
"Incorrect handling of NaNs or Infs in the XPU implementation."
],
"suggested_solutions": [
"Increase numerical tolerance in tensor comparisons if acceptable.",
"Review and align XPU multihead attention implementation with CPU.",
"Investigate and correct NaN/Inf handling in the XPU code."
]
} |
Triage bot UT analaysis result for reference only, please note unique error message only report once:
triage bot result: {
"similar_issue_id": 1214,
"similar_issue_state": "open",
"issue_owner": "daisyden",
"issue_description": "In preci test, there are random cases will fail with 'AssertionError: Tensor-likes are not close!'. The failing test cases include: test_python_ref__refs_exp_xpu_complex128, test_python_ref__refs_sigmoid_xpu_complex128, test_python_ref_executor__refs_log2_executor_aten_xpu_complex128, test_python_ref_executor__refs_exp_executor_aten_xpu_complex128, test_python_ref_torch_fallback__refs_log2_xpu_complex128, test_python_ref_torch_fallback__refs_log10_xpu_complex128, test_python_ref_torch_fallback__refs_sigmoid_xpu_complex128. A workaround PR is provided: https://github.com/intel/torch-xpu-ops/pull/1211. Additional random failures to be added to skiplist: TestCommonXPU.test_python_ref_executor__refs_sigmoid_executor_aten_xpu_complex128, TestCommonXPU.test_compare_cpu_nn_functional_local_response_norm_xpu_bfloat16, test_ops_xpu.py::TestCommonXPU::test_python_ref__refs_log10_xpu_complex128.\ntest_foreach_xpu.py::TestForeachXPU::test_parity__foreach_div_fastpath_outplace_xpu_complex128 failed in release/2.7 RC2 pre release wheel",
"root_causes": "The issue involves tensor comparison failures, potentially due to precision issues or implementation differences on XPU devices.",
"suggested_solutions": "Implement the workaround provided in PR #1211 to address the tensor comparison discrepancies. Adjust computations on XPU to align results with expected values."
}
triage bot result: {
"similar_issue_id": "N/A",
"similar_issue_state": "N/A",
"issue_owner": "daisyden",
"issue_description": "Unit test third_party.torch-xpu-ops.test.xpu.test_ops_xpu.TestCommonXPU.test_noncontiguous_samples_nn_functional_conv3d_xpu_int64 failed with RuntimeError: Long is not supported in oneDNN!",
"root_causes": [
"Mismatch in data type handling where Long tensors are passed to oneDNN, which doesn't support them.",
"Potential missing conversion or handling step for Long tensors in XPU operations."
],
"suggested_solutions": [
"Ensure that data types are correctly converted before processing in oneDNN, possibly converting Long tensors to a supported type.",
"Enhance the XPU backend to handle Long tensors appropriately in NN functional operations."
]
}
triage bot result: {
"similar_issue_id": 1214,
"similar_issue_state": "open",
"issue_owner": "daisyden",
"issue_description": "In preci test, there are random cases will fail with 'AssertionError: Tensor-likes are not close!'. The failing test cases include: test_python_ref__refs_exp_xpu_complex128, test_python_ref__refs_sigmoid_xpu_complex128, test_python_ref_executor__refs_log2_executor_aten_xpu_complex128, test_python_ref_executor__refs_exp_executor_aten_xpu_complex128, test_python_ref_torch_fallback__refs_log2_xpu_complex128, test_python_ref_torch_fallback__refs_log10_xpu_complex128, test_python_ref_torch_fallback__refs_sigmoid_xpu_complex128. A workaround PR is provided: https://github.com/intel/torch-xpu-ops/pull/1211. Additional random failures to be added to skiplist: TestCommonXPU.test_python_ref_executor__refs_sigmoid_executor_aten_xpu_complex128, TestCommonXPU.test_compare_cpu_nn_functional_local_response_norm_xpu_bfloat16, test_ops_xpu.py::TestCommonXPU::test_python_ref__refs_log10_xpu_complex128.\ntest_foreach_xpu.py::TestForeachXPU::test_parity__foreach_div_fastpath_outplace_xpu_complex128 failed in release/2.7 RC2 pre release wheel",
"root_causes": [
"Numerical precision issues in XPU operations leading to tensor value mismatches.",
"Possible implementation discrepancies between CPU and XPU for attention mask computations.",
"Random test failures attributed to hardware-specific behavior or numerical instability."
],
"suggested_solutions": [
"Investigate the specific computation of attention masks on XPU and ensure consistency with CPU behavior.",
"Review and adjust numerical precision settings or tolerance levels in the tests.",
"Implement a workaround as suggested in the provided PR, possibly adjusting test expectations or skipping known problematic cases.",
"Add the failing test to the skip list if it's determined to be a false positive or beyond the scope of immediate fixes."
]
} |
No description provided.