-
Notifications
You must be signed in to change notification settings - Fork 39
Skip dist all2all related case #1675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR disables several distributed all-to-all tests by commenting them out, effectively skipping these tests during execution.
- Commented out the test for alltoall operations with xpufree race
- Commented out multiple tests for all-to-all single operations and variants
# @requires_xccl() | ||
# @skip_but_pass_in_sandcastle_if(not TEST_MULTIGPU, "XCCL test requires 2+ GPUs") | ||
# def test_alltoall_ops_with_xpufree_race(self): | ||
# pg = self.pg | ||
# opts = c10d.AllToAllOptions() | ||
# local_device = f"xpu:{self.rank_to_GPU[self.rank][0]}" | ||
# torch.xpu.set_device(local_device) | ||
# input = torch.rand(1000, 1000, device=local_device) | ||
# output = torch.rand(1000, 1000, device=local_device) | ||
# race_tensors = [] | ||
# # create some tensors to race with alltoall collective | ||
# for _ in range(10): | ||
# tmp = [] | ||
# for i in range(5): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Disabling the 'test_alltoall_ops_with_xpufree_race' by commenting out the test may lead to code clutter; consider using a skip decorator or removing the test if it's no longer required.
# @requires_xccl() | |
# @skip_but_pass_in_sandcastle_if(not TEST_MULTIGPU, "XCCL test requires 2+ GPUs") | |
# def test_alltoall_ops_with_xpufree_race(self): | |
# pg = self.pg | |
# opts = c10d.AllToAllOptions() | |
# local_device = f"xpu:{self.rank_to_GPU[self.rank][0]}" | |
# torch.xpu.set_device(local_device) | |
# input = torch.rand(1000, 1000, device=local_device) | |
# output = torch.rand(1000, 1000, device=local_device) | |
# race_tensors = [] | |
# # create some tensors to race with alltoall collective | |
# for _ in range(10): | |
# tmp = [] | |
# for i in range(5): | |
@unittest.skip("Skipping test_alltoall_ops_with_xpufree_race due to known issues with XPU free race conditions.") | |
@requires_xccl() | |
@skip_but_pass_in_sandcastle_if(not TEST_MULTIGPU, "XCCL test requires 2+ GPUs") | |
def test_alltoall_ops_with_xpufree_race(self): | |
pg = self.pg | |
opts = c10d.AllToAllOptions() | |
local_device = f"xpu:{self.rank_to_GPU[self.rank][0]}" | |
torch.xpu.set_device(local_device) | |
input = torch.rand(1000, 1000, device=local_device) | |
output = torch.rand(1000, 1000, device=local_device) | |
race_tensors = [] | |
# create some tensors to race with alltoall collective | |
for _ in range(10): | |
tmp = [] |
Copilot uses AI. Check for mistakes.
# @requires_xccl() | ||
# @skip_but_pass_in_sandcastle_if(not TEST_MULTIGPU, "XCCL test requires 2+ GPUs") | ||
# def test_all_to_all_single(self): | ||
# device = self.rank_to_GPU[self.rank][0] | ||
# row = self.world_size * (self.rank + 1) * (self.world_size + 1) / 2 | ||
# x = torch.ones(int(row), 5, device=device) * (self.rank + 1) | ||
# x.requires_grad = True | ||
# y = torch.empty_like(x) | ||
# split_sizes = [(i + 1) * (self.rank + 1) for i in range(self.world_size)] | ||
# y = torch.distributed.nn.all_to_all_single( | ||
# y, x, output_split_sizes=split_sizes, input_split_sizes=split_sizes | ||
# ) | ||
# expected = [] | ||
# for idx, tensor in enumerate(torch.split(x, split_sizes)): | ||
# expected.append(torch.full_like(tensor, (idx + 1))) | ||
# expected = torch.cat(expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The block of tests for various all-to-all operations has been commented out rather than formally skipped; consider refactoring with proper skip annotations or removing this code to improve maintainability.
# @requires_xccl() | |
# @skip_but_pass_in_sandcastle_if(not TEST_MULTIGPU, "XCCL test requires 2+ GPUs") | |
# def test_all_to_all_single(self): | |
# device = self.rank_to_GPU[self.rank][0] | |
# row = self.world_size * (self.rank + 1) * (self.world_size + 1) / 2 | |
# x = torch.ones(int(row), 5, device=device) * (self.rank + 1) | |
# x.requires_grad = True | |
# y = torch.empty_like(x) | |
# split_sizes = [(i + 1) * (self.rank + 1) for i in range(self.world_size)] | |
# y = torch.distributed.nn.all_to_all_single( | |
# y, x, output_split_sizes=split_sizes, input_split_sizes=split_sizes | |
# ) | |
# expected = [] | |
# for idx, tensor in enumerate(torch.split(x, split_sizes)): | |
# expected.append(torch.full_like(tensor, (idx + 1))) | |
# expected = torch.cat(expected) | |
@requires_xccl() | |
@skip_but_pass_in_sandcastle_if(not TEST_MULTIGPU, "XCCL test requires 2+ GPUs") | |
def test_all_to_all_single(self): | |
device = self.rank_to_GPU[self.rank][0] | |
row = self.world_size * (self.rank + 1) * (self.world_size + 1) / 2 | |
x = torch.ones(int(row), 5, device=device) * (self.rank + 1) | |
x.requires_grad = True | |
y = torch.empty_like(x) | |
split_sizes = [(i + 1) * (self.rank + 1) for i in range(self.world_size)] | |
y = torch.distributed.nn.all_to_all_single( | |
y, x, output_split_sizes=split_sizes, input_split_sizes=split_sizes | |
) | |
expected = [] | |
for idx, tensor in enumerate(torch.split(x, split_sizes)): | |
expected.append(torch.full_like(tensor, (idx + 1))) | |
expected = torch.cat(expected) |
Copilot uses AI. Check for mistakes.
No description provided.