Motivation
- Diffusion-based policies (e.g., Diffusion Policy, Diffusion-QL) have demonstrated strong empirical results in robotics and offline RL.
Solution
Solution (Plan & References)
We will implement Phase-1 by adapting the core ideas and minimal components from the Diffusion Policy work—Diffusion Policy: Visuomotor Policy Learning via Action Diffusion—and its official codebase, aligning them with TorchRL’s API patterns (TensorDict/Modules/Objectives/Transforms).
-
Primary references
- Paper & project page: Diffusion Policy (RSS 2023 / IJRR), with public project materials, data, and logs. ([GitHub][1])
- Official code repository:
real-stanford/diffusion_policy (training configs, scripts, and Colab demos for state/vision tasks). ([GitHub][1])
-
What we will port/adapt
-
Actor (DiffusionActor)
- A score-based policy that denoises latent actions conditioned on observations—implemented as
torchrl.modules.DiffusionActor.
- Pluggable score network (e.g., small MLP for low-dim control; CNN encoder later for pixels), scheduler (DDPM-style first), and
num_steps.
- Strict TensorDict contract:
in_keys=["observation"] → out_keys=["action"].
-
Objective (DiffusionBCLoss)
- Supervised denoising/ε-prediction loss and a score-matching variant for imitation learning, following the paper’s training target while fitting TorchRL’s Objective API.
-
Example & Repro Path
examples/diffusion_bc_pendulum.py (state-based control) to mirror the repo’s low-dim examples first.
- Clear instructions to plug in public training data/config patterns analogous to the reference repo’s setup (e.g., single-seed + multi-seed runs). ([GitHub][1])
Checklist
Motivation
Solution
Solution (Plan & References)
We will implement Phase-1 by adapting the core ideas and minimal components from the Diffusion Policy work—Diffusion Policy: Visuomotor Policy Learning via Action Diffusion—and its official codebase, aligning them with TorchRL’s API patterns (TensorDict/Modules/Objectives/Transforms).
Primary references
real-stanford/diffusion_policy(training configs, scripts, and Colab demos for state/vision tasks). ([GitHub][1])What we will port/adapt
Actor (DiffusionActor)
torchrl.modules.DiffusionActor.num_steps.in_keys=["observation"]→out_keys=["action"].Objective (DiffusionBCLoss)
Example & Repro Path
examples/diffusion_bc_pendulum.py(state-based control) to mirror the repo’s low-dim examples first.Checklist