fix: random sampling in ForgetRetainDataset#145
fix: random sampling in ForgetRetainDataset#145ZeguanXiao wants to merge 6 commits intolocuslab:mainfrom
Conversation
src/data/unlearn.py
Outdated
| g = torch.Generator() | ||
| rank = torch.distributed.get_rank() if torch.distributed.is_initialized() else 0 | ||
| seed = int(torch.empty((), dtype=torch.int64).random_().item() + rank) | ||
| g.manual_seed(seed) |
There was a problem hiding this comment.
it would be better to use the seed from the experiment config here, rather than
int(torch.empty((), dtype=torch.int64).random_().item() to avoid introducing randomness uncontrolled by the seed.
can you try to see if you can make the experiment's cfg.seed available to this dataset class and then use seed = exp_seed + rank here?
molereddy
left a comment
There was a problem hiding this comment.
Thank you for the PR! Please see comment
|
Thanks for the feedback! I've updated the PR accordingly. Please let me know if there are any further adjustments required. |
|
Please fix the lint errors! |
molereddy
left a comment
There was a problem hiding this comment.
It is not ideal to set the seed at the exact example level. This would mean we select the same retain example index sequences even if we are using a different dataset.
Since the point is that each rank must get a different seed, imo it is better to get the rank in the global seed function: https://github.com/locuslab/open-unlearning/blob/main/src/trainer/utils.py#L8
Let me know if you see any issues.
|
@molereddy Simply modifying |
|
@molereddy Currently, my implementation adds a Could you please check if this approach is feasible/correct? |
|
If the goal is simply to have different idx for different ranks, there's also a simpler solution: Same for forget_idx. And that's the only code change that would be needed. Or for even more scrambling maybe instead of adding rank, add some hash of |
|
Hi, is there any estimate on this? |
What does this PR do?
Fixes #139
Before submitting