[Feature Request] Async TrajectoryBatcher + Replay Buffer integration with SliceSampler

Motivation
Off-policy and replay-based RL algorithms — particularly those using recurrent policies (RNNs/LSTMs), Hindsight Experience Replay (HER), or episodic PPO variants — require storing and sampling full episodes rather than random transitions. Currently, even with the TrajectoryBatcher utility added in #[original PR], there is no straightforward path to:

Collect episodes asynchronously in the background via collector.start()
Store complete, padded trajectory batches into a ReplayBuffer
Sample contiguous trajectory slices using SliceSampler
Users who want this workflow must manually wire together the async collector API, the replay buffer extend() call, and the trajectory-aware sampling tooling — which is error-prone and reimplemented repeatedly.

Solution
Extend TrajectoryBatcher (and/or add a companion utility/example) to support:

Async collection via collector.start(), so trajectory assembly happens while training continues
ReplayBuffer.extend() compatibility with the zero-padded [N, max_len, ...] TensorDicts produced by TrajectoryBatcher, including correct handling of the ("collector", "mask") field
SliceSampler integration, using ("collector", "traj_ids") as the trajectory key so sampled slices respect episode boundaries
A reference example demonstrating the full async collect → store → sample → train loop would also be included.

Alternatives
Manual wiring: Users can already combine collector.start(), TrajectoryBatcher, and ReplayBuffer themselves, but this requires deep familiarity with each API and is not documented anywhere.
Transition-level replay: Standard ReplayBuffer + RandomSampler works for transition-based algorithms but loses episode structure, which breaks recurrent policies and trajectory-dependent methods.
Keeping it synchronous: The original PR covers the synchronous on-policy case. This is sufficient for REINFORCE/episodic PPO but not for off-policy or sample-efficient workflows.
Additional context
Follows up #3584 which introduced TrajectoryBatcher for synchronous, on-policy episode batching. This PR targets the complementary off-policy / async use case.

Relevant existing APIs this builds on:

collector.start() / async collector interface
ReplayBuffer + LazyTensorStorage
SliceSampler (already traj-ID-aware)
Checklist
 I have checked that there is no similar issue in the repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Async TrajectoryBatcher + Replay Buffer integration with SliceSampler #3591

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Async TrajectoryBatcher + Replay Buffer integration with SliceSampler #3591

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions