-
Notifications
You must be signed in to change notification settings - Fork 39
Daisyden/artifacts #1630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Daisyden/artifacts #1630
Conversation
xpu-ops bot UT traige result for your refrence:
|
xpu-ops bot UT traige result for your refrence:
|
xpu-ops bot UT traige result for your refrence:
|
| |
Xpu-ops triage bot UT analaysis result for your reference, only analyzed unique errors: test.distributed._composable.fsdp.test_fully_shard_compile.TestFullyShardCompileCompute . test_disable_compiling_hooks got failed with RuntimeError: Process 0 terminated or timed out after 300.0208730697632 seconds , triage_bot result:
test.distributed._composable.fsdp.test_fully_shard_autograd.TestFullyShardAutograd . test_nontensor_activations got failed with RuntimeError: Process 0 exited with error code 10 and exception: ; RuntimeError: oneCCL: coll_param.cpp:455 validate: EXCEPTION: average operation is not supported for the scheduler path ; RuntimeError: oneCCL: coll_param.cpp:455 validate: EXCEPTION: average operation is not supported for the scheduler path , triage_bot result:
|
Xpu-ops triage bot UT analaysis result for your reference, only analyzed unique errors:
|
xpu-ops triage bot UT analaysis result for reference, only highlighted unique errors:
{
"similar_issue_id": 1535,
"similar_issue_state": "in progress",
"issue_owner": "ratnampa",
"issue_description": "RuntimeError: Process 0 terminated or timed out after 300.0208730697632 seconds in the test case test_disable_compiling_hooks. The issue arises in distributed FSDP operations, potentially linked to oneCCL's Gather operation handling, causing test failures in specific configurations.",
"root_causes": [
"Incorrect handling of object gather operations in oneCCL's gold release branch leading to process termination.",
"Regression in oneCCL's gold release affecting distributed operations."
],
"suggested_solutions": [
"Investigate changes in oneCCL's gold release that may have introduced the issue.",
"Update or revert problematic changes in oneCCL to resolve the regression.",
"Review and potentially modify distributed communication components to handle operations correctly."
]
}
{
"similar_issue_id": 1508,
"similar_issue_state": "unresolved",
"issue_owner": "ratnampa",
"issue_description": "The unit test test.distributed._composable.fsdp.test_fully_shard_autograd.TestFullyShardAutograd/test_nontensor_activations failed with a RuntimeError indicating that the average operation is not supported for the scheduler path in oneCCL. This error is identical to the one reported in issue 1508, suggesting it is a known issue related to the lack of support for INT64 dtype in SYCL collectives, leading to fallback to an older implementation that doesn't handle average operations.",
"root_causes": ["Lack of support for INT64 dtype in SYCL collectives causing fallback to an older implementation that doesn't handle average operations"],
"suggested_solutions": ["Implement support for INT64 dtype in SYCL collectives to prevent fallback to an unsupported path, or modify the fallback mechanism to handle average operations."]
} |
{"issue_number": 827,
"issue_description": "A reproducer for the behavior of `index_put_` which is inconsistency with other backends. The root cause is `checkIndexTensorTypes`.\nThe reporter of the issue is guangyey, and the assignee is Stonepia, and the state of the issue is closed.",
"error_message": "IndexError: tensors used as indices must be long, byte or bool tensors",
"reporter": "guangyey",
"assignee": "Stonepia",
"resolution": "\nThe issue was resolved by fixing the `checkIndexTensorType()` function by adding the `allow_int` parameter, which was missing before.",
"root_cause": "checkIndexTensorTypes",
"state": "closed"}
|
For ai_for_validation testing.