fix(offpolicy): unify single/multi-GPU learner log interval by TATP-233 · Pull Request #656 · unilabsim/UniLab

TATP-233 · 2026-06-30T07:30:51Z

Summary

Make DoubleBufferOffPolicyRunner log training metrics every iteration instead of every 10 iterations, aligning its TensorBoard scalar density with MultiGPUOffPolicyRunner.

Motivation

Single-GPU FlashSAC runs were logging 10x fewer scalar points than multi-GPU runs because DoubleBufferOffPolicyRunner.LEARNER_LOG_INTERVAL = 10 only called logger.log_step every 10 iterations. This made direct reward-curve comparison across GPU counts misleading.

Changes

Remove LEARNER_LOG_INTERVAL from DoubleBufferOffPolicyRunner.
Call logger.log_step on every training iteration.
Update status message and trace metadata to reflect log_interval=1.

Validation

uv run pytest tests/algos/test_offpolicy_double_buffer_runner.py -q passes (46 passed).
make test-all fails locally due to an unrelated MuJoCo batch_env native abort in tests/base/test_mujoco_batch_env_jacobian.py, not caused by this change.

Make DoubleBufferOffPolicyRunner log training metrics every iteration instead of every 10 iterations. This aligns its TensorBoard scalar density with MultiGPUOffPolicyRunner so single-GPU and multi-GPU runs are directly comparable.

TATP-233 requested a review from caozx1110 as a code owner June 30, 2026 07:30

style(offpolicy): ruff format double_buffer_runner

d77b1a8

TATP-233 merged commit def5f24 into main Jun 30, 2026
6 checks passed

TATP-233 deleted the fix/unify-offpolicy-log-frequency branch June 30, 2026 07:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(offpolicy): unify single/multi-GPU learner log interval#656

fix(offpolicy): unify single/multi-GPU learner log interval#656
TATP-233 merged 2 commits into
mainfrom
fix/unify-offpolicy-log-frequency

TATP-233 commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

TATP-233 commented Jun 30, 2026

Summary

Motivation

Changes

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant