Skip to content

🐛[BUG]: TypeError: RegressionLoss.__call__() got an unexpected keyword argument 'use_patch_grad_acc' #865

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
luke-conibear opened this issue May 2, 2025 · 1 comment · Fixed by #868
Assignees
Labels
bug Something isn't working

Comments

@luke-conibear
Copy link

luke-conibear commented May 2, 2025

Version

Latest from main branch

On which installation method(s) does this occur?

Source

Describe the issue

Following this PR, the CorrDiff example has an error in the regression training:

[2025-05-02 10:42:19,754][main][INFO] - Using dataset: hrrr_mini
[2025-05-02 10:42:19,755][main][INFO] - Saving the outputs in /mnt/azureml/cr/j/.../exe/wd
[2025-05-02 10:42:36,214][main][INFO] - Patch-based training disabled
[2025-05-02 10:42:36,512][main][INFO] - Using 4 gradient accumulation rounds
[2025-05-02 10:42:36,531][checkpoint][WARNING] - Provided checkpoint directory /mnt/azureml/cr/j/.../cap/data-capability/wd/checkpoint_dir/checkpoints_regression does not exist, skipping load
[2025-05-02 10:42:36,531][main][INFO] - Training for 2000000 images...
Error executing job with overrides: ['++dataset.data_path=...', '++dataset.stats_path=...', '++training.hp.total_batch_size=256', '++training.hp.batch_size_per_gpu=64', '++training.perf.dataloader_workers=1', '++training.io.checkpoint_dir=...']
Traceback (most recent call last):
  File "/mnt/azureml/cr/j/.../exe/wd/train.py", line 728, in <module>
    main()
  File "/usr/local/lib/python3.12/dist-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
            ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
        ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.12/dist-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/azureml/cr/j/.../exe/wd/train.py", line 493, in main
    loss = loss_fn(**loss_fn_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: RegressionLoss.__call__() got an unexpected keyword argument 'use_patch_grad_acc'

In the PR, I see that the use_patch_grad_acc keyword argument was added to ResidualLoss but not RegressionLoss.

Should the same change be applied there?

@luke-conibear luke-conibear added ? - Needs Triage Need team to review and classify bug Something isn't working labels May 2, 2025
@CharlelieLrt CharlelieLrt self-assigned this May 2, 2025
@CharlelieLrt CharlelieLrt removed the ? - Needs Triage Need team to review and classify label May 2, 2025
@CharlelieLrt
Copy link
Collaborator

@luke-conibear thanks for reporting this.
The patch-wise gradient accumulation, which is activated by use_patch_grad_acc =True, is an optimization that should only be used for training patched diffusion models.
We will push a fix that automatically disables it for all other types of models (e.g. non-patched diffusion and regression).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants