Significant difference in results with different GPU counts on WMDP bio dataset using GradDiff

### Information

- [x] The official example scripts
- [ ] My own modified scripts

### Tasks

- [x] An officially supported task
- [ ] My own task or dataset (give details below)

### Reproduction

I’ve encountered a very strange issue when running the WMDP bio dataset with the GradDiff method. Specifically, I observed that the number of GPUs used (I tested with 4 vs 8 GPUs) leads to significant differences in the results, even though the effective batch size was kept constant (32).

Has anyone else experienced this? Could there be any known reason why the GPU count would have such a strong impact on the outcomes?

### Expected behavior

The results should be identical or similar.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant difference in results with different GPU counts on WMDP bio dataset using GradDiff #144

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant difference in results with different GPU counts on WMDP bio dataset using GradDiff #144

Description

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions