Skip to content

fix(offpolicy): unify single/multi-GPU learner log interval#656

Merged
TATP-233 merged 2 commits into
mainfrom
fix/unify-offpolicy-log-frequency
Jun 30, 2026
Merged

fix(offpolicy): unify single/multi-GPU learner log interval#656
TATP-233 merged 2 commits into
mainfrom
fix/unify-offpolicy-log-frequency

Conversation

@TATP-233

Copy link
Copy Markdown
Collaborator

Summary

Make DoubleBufferOffPolicyRunner log training metrics every iteration instead of every 10 iterations, aligning its TensorBoard scalar density with MultiGPUOffPolicyRunner.

Motivation

Single-GPU FlashSAC runs were logging 10x fewer scalar points than multi-GPU runs because DoubleBufferOffPolicyRunner.LEARNER_LOG_INTERVAL = 10 only called logger.log_step every 10 iterations. This made direct reward-curve comparison across GPU counts misleading.

Changes

  • Remove LEARNER_LOG_INTERVAL from DoubleBufferOffPolicyRunner.
  • Call logger.log_step on every training iteration.
  • Update status message and trace metadata to reflect log_interval=1.

Validation

  • uv run pytest tests/algos/test_offpolicy_double_buffer_runner.py -q passes (46 passed).
  • make test-all fails locally due to an unrelated MuJoCo batch_env native abort in tests/base/test_mujoco_batch_env_jacobian.py, not caused by this change.

Make DoubleBufferOffPolicyRunner log training metrics every iteration
instead of every 10 iterations. This aligns its TensorBoard scalar
density with MultiGPUOffPolicyRunner so single-GPU and multi-GPU runs
are directly comparable.
@TATP-233 TATP-233 requested a review from caozx1110 as a code owner June 30, 2026 07:30
@TATP-233 TATP-233 merged commit def5f24 into main Jun 30, 2026
6 checks passed
@TATP-233 TATP-233 deleted the fix/unify-offpolicy-log-frequency branch June 30, 2026 07:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant