Motivation
Off-policy and replay-based RL algorithms — particularly those using recurrent policies (RNNs/LSTMs), Hindsight Experience Replay (HER), or episodic PPO variants — require storing and sampling full episodes rather than random transitions. Currently, even with the TrajectoryBatcher utility added in #[original PR], there is no straightforward path to:
Collect episodes asynchronously in the background via collector.start()
Store complete, padded trajectory batches into a ReplayBuffer
Sample contiguous trajectory slices using SliceSampler
Users who want this workflow must manually wire together the async collector API, the replay buffer extend() call, and the trajectory-aware sampling tooling — which is error-prone and reimplemented repeatedly.
Solution
Extend TrajectoryBatcher (and/or add a companion utility/example) to support:
Async collection via collector.start(), so trajectory assembly happens while training continues
ReplayBuffer.extend() compatibility with the zero-padded [N, max_len, ...] TensorDicts produced by TrajectoryBatcher, including correct handling of the ("collector", "mask") field
SliceSampler integration, using ("collector", "traj_ids") as the trajectory key so sampled slices respect episode boundaries
A reference example demonstrating the full async collect → store → sample → train loop would also be included.
Alternatives
Manual wiring: Users can already combine collector.start(), TrajectoryBatcher, and ReplayBuffer themselves, but this requires deep familiarity with each API and is not documented anywhere.
Transition-level replay: Standard ReplayBuffer + RandomSampler works for transition-based algorithms but loses episode structure, which breaks recurrent policies and trajectory-dependent methods.
Keeping it synchronous: The original PR covers the synchronous on-policy case. This is sufficient for REINFORCE/episodic PPO but not for off-policy or sample-efficient workflows.
Additional context
Follows up #3584 which introduced TrajectoryBatcher for synchronous, on-policy episode batching. This PR targets the complementary off-policy / async use case.
Relevant existing APIs this builds on:
collector.start() / async collector interface
ReplayBuffer + LazyTensorStorage
SliceSampler (already traj-ID-aware)
Checklist
I have checked that there is no similar issue in the repo
Motivation
Off-policy and replay-based RL algorithms — particularly those using recurrent policies (RNNs/LSTMs), Hindsight Experience Replay (HER), or episodic PPO variants — require storing and sampling full episodes rather than random transitions. Currently, even with the TrajectoryBatcher utility added in #[original PR], there is no straightforward path to:
Collect episodes asynchronously in the background via collector.start()
Store complete, padded trajectory batches into a ReplayBuffer
Sample contiguous trajectory slices using SliceSampler
Users who want this workflow must manually wire together the async collector API, the replay buffer extend() call, and the trajectory-aware sampling tooling — which is error-prone and reimplemented repeatedly.
Solution
Extend TrajectoryBatcher (and/or add a companion utility/example) to support:
Async collection via collector.start(), so trajectory assembly happens while training continues
ReplayBuffer.extend() compatibility with the zero-padded [N, max_len, ...] TensorDicts produced by TrajectoryBatcher, including correct handling of the ("collector", "mask") field
SliceSampler integration, using ("collector", "traj_ids") as the trajectory key so sampled slices respect episode boundaries
A reference example demonstrating the full async collect → store → sample → train loop would also be included.
Alternatives
Manual wiring: Users can already combine collector.start(), TrajectoryBatcher, and ReplayBuffer themselves, but this requires deep familiarity with each API and is not documented anywhere.
Transition-level replay: Standard ReplayBuffer + RandomSampler works for transition-based algorithms but loses episode structure, which breaks recurrent policies and trajectory-dependent methods.
Keeping it synchronous: The original PR covers the synchronous on-policy case. This is sufficient for REINFORCE/episodic PPO but not for off-policy or sample-efficient workflows.
Additional context
Follows up #3584 which introduced TrajectoryBatcher for synchronous, on-policy episode batching. This PR targets the complementary off-policy / async use case.
Relevant existing APIs this builds on:
collector.start() / async collector interface
ReplayBuffer + LazyTensorStorage
SliceSampler (already traj-ID-aware)
Checklist
I have checked that there is no similar issue in the repo