Add agent simulation learning workflow#30
Merged
yichao-liang merged 71 commits intomasterfrom May 5, 2026
Merged
Conversation
Delegate option execution to option_model.get_next_state_and_num_actions instead of duplicating its termination logic (stuck detection, Wait atom-change checks) and directly accessing its simulator.
…inement Extract the duplicated backtracking loop from run_low_level_search (SeSamE) and _refine_sketch (agent bilevel) into a single run_backtracking_refinement function in planning.py. Both callers now delegate to it with their own sample_fn and validate_fn callbacks, eliminating ~80 lines of duplicated loop/backtracking logic.
Replace 60 lines of manual option-model execution with a call to run_backtracking_refinement using max_tries=[1] and a sample_fn that returns the pre-grounded options. Remove unused Any import.
Move the _current_observation assignment into _reset_state so callers don't need to remember the two-step pattern. Clarify the relationship between _current_observation (backing field) and _current_state (typed read accessor) in docstrings and comments.
Adds agent_bilevel_plan_sketch_file setting that, when set to a file path, loads the plan sketch directly from that file, bypassing the foundation model query. Includes test data files and a unit test.
Extract repeated wait-termination check into _check_wait_termination helper and unify the three _terminal branches into a single definition with config checks inside the function body.
- Remove dead/commented-out code and stale self-question comments - Add _VIRTUAL_OBJECT_TYPES constant to replace hardcoded type-name skip lists in _set_state and _get_state - Move env-specific _get_robot_state_dict branches to subclass overrides in pybullet_cover and pybullet_blocks - Extract _get_camera_matrices helper to deduplicate render methods - Extract _get_object_state_dict from _get_state for per-object logic - Move create_pybullet_block/sphere to pybullet_helpers/objects.py - Merge _create_task_specific_objects into _set_domain_specific_state - Rename: _reset_state -> _set_state, _reset_custom_env_state -> _set_domain_specific_state, _extract_feature -> _get_domain_specific_feature - Add docstrings explaining where each method is called from
Reorganize methods into labeled sections (Setup, Public API, Core Loop, State Write/Read, Grasp Management, Action Helpers, Rendering, Utilities) so related functions are adjacent. Update module docstring to document the main public API and state synchronization methods.
Add _step_base() and _domain_specific_step() to PyBulletEnv base class. step() now calls _step_base (robot control, physics, grasp) then _domain_specific_step (water filling, heating, etc.), gated by _skip_domain_specific_dynamics flag for kinematics-only mode. Migrate all 15 domain envs to override _domain_specific_step() instead of step(). Envs with pre-step logic (coffee, switch, blocks, cover) still override step() for the pre-step part only.
Document the step_base → domain_specific_step → get_observation flow, _skip_domain_specific_dynamics flag, and _domain_specific_step as an optional override.
Replace direct access to private _skip_domain_specific_dynamics attribute with a public constructor parameter, so callers declare kinematics-only mode at creation time instead of mutating internal state after construction.
…ging Both AgentSessionMixin and AgentExplorer had near-identical wrappers that ran session.query() synchronously via nest_asyncio or asyncio.run. Move that logic into a module-level run_query_sync helper in session_manager and have both callers delegate to it.
…y and maintainability
Distinguishes the grounded-plan explorer from upcoming bilevel variants. AgentExplorer -> AgentPlanExplorer, get_name() 'agent' -> 'agent_plan', file moved to agent_plan_explorer.py, and all callers / docstrings / YAML config examples updated accordingly.
The mixin is pure agent-session plumbing (session creation, lifecycle, explorer factory) and has no approach-specific logic, so it belongs next to session_manager.py, tools.py, and the sandbox managers rather than in approaches/.
The explorer asks a Claude agent for a plan sketch, refines it against the approach's current (possibly learned) option model, and rolls the refined plan out in the real env. When the mental model disagrees with reality — e.g. the sketch expects JugFilled after a Wait but the mental model's process dynamics can't produce it — the explorer truncates the plan at the deepest unsatisfiable subgoal (inclusive) so the real-env rollout ends exactly where the disagreement occurs, maximising signal per experiment. Key pieces: - predicators/agent_sdk/bilevel_sketch.py: extracted the sketch build / parse / refine helpers from AgentBilevelApproach as module-level functions so both the approach (solve path) and the new explorer (exploration path) can share them. refine_sketch gains truncate_on_subgoal_fail: the on_step_fail callback snapshots the deepest subgoal failure seen during backtracking, and on exhaustion the captured prefix is returned as the experiment plan. - predicators/explorers/agent_bilevel_explorer.py: new explorer. Reads option_model from tool_context (synced by the approach), builds the sketch prompt via bilevel_sketch, runs refine_sketch with check_subgoals=True, check_final_goal=False, truncate_on_subgoal_fail =True, wraps the result in an option_plan_to_policy that converts OptionExecutionFailure into RequestActPolicyFailure so the episode cleanly terminates at the point of real-env divergence. Stashes the sketch subgoals/options on ToolContext for downstream diffing by the learning approach. - predicators/approaches/agent_bilevel_approach.py: shim methods over bilevel_sketch; behaviour unchanged. - predicators/approaches/agent_planner_approach.py: _create_explorer dispatches both "agent_plan" and "agent_bilevel" through the agent factory path and forwards CFG.explorer as the name. - predicators/explorers/__init__.py: factory branch merged for the two agent-session-backed explorers. - predicators/agent_sdk/tools.py: ToolContext gains last_sketch_subgoals / last_sketch_options fields, populated by the explorer and marked TODO for the learning approach to consume. - tests/explorers/test_agent_bilevel_explorer.py: happy-path, fallback, wait-memory-injection, and deepest-subgoal-failure truncation tests.
- New setting agent_bilevel_explorer_max_samples_per_step (default 50), separate from the solve-path budget, so the explorer's backtracking cost is independently tunable. - Log the actual experiment plan (option names, objects, params) after refinement so the explorer's output is visible alongside the existing sketch/truncation log lines. - Test config updated to set both budgets explicitly.
AgentSimLearningApproach extends AgentBilevelApproach to learn process dynamics online. Each cycle: the agent synthesizes parameterized process rules via Claude (using run_python / evaluate_simulator / test_simulator MCP tools), parameters are fitted via emcee MCMC, and the learned dynamics are composed with a kinematics-only PyBullet oracle into a combined option model for plan refinement. Key pieces: - predicators/approaches/agent_sim_learning_approach.py: the approach. Initialises with a kinematics-only option model (so AgentBilevelExplorer sees disagreements at process-dynamic subgoals like JugFilled/Boiled), and replaces it with the kin+learned model after each successful synthesis cycle. - predicators/agent_sdk/tools.py: create_synthesis_tools() builds the three MCP tools the synthesis agent uses; extra_mcp_tools field and get_allowed_tool_list(extra_names=) plumbing lets the approach inject them into the session. - predicators/code_sim_learning/: ParamSpec, fit_params (emcee MCMC), compute_mse, LearnedSimulator. - predicators/ground_truth_models/boil/gt_simulator.py: ground-truth process-dynamics simulator for the boil environment. - tests/: approach and param-fitting tests.
- agents.yaml: comment out agent_bilevel preset, add agent_sim_learning with explorer=agent_bilevel and skip_test_until_last_ite_or_early_stopping. - common.yaml: disable failure/test video recording, set num_online_learning_cycles=1 for faster iteration.
Simulation primitives (code_sim_learning/utils.py): - apply_rules(state, rules, params) → ProcessUpdate - merge_updates(base_state, updates, process_features) → State - simulate_step(state, action, base_env, rules, params, features) → State These replace _build_fitted_step_fn, merge_process_updates, _sim_fn_from_rules, and the body of _build_combined_simulator. GT simulator factory (ground_truth_models): - GroundTruthSimulatorFactory ABC + get_gt_simulator(env_name) discovery, following the existing get_gt_options / get_gt_nsrts pattern. - PyBulletBoilGroundTruthSimulatorFactory registered in boil/. - Replaces the hardcoded _load_oracle_simulator in the approach. Oracle ablation flags (settings.py): - agent_sim_learn_oracle_sim_program: load GT rules, skip synthesis. - agent_sim_learn_oracle_sim_params: use GT param values, skip MCMC. Also: kin_env → base_env rename throughout, redundant self._types assignment removed, process_features computed once in __init__.
- yapf + isort autoformatting applied to all touched files. - pylint: fix logging-not-lazy in agent_bilevel_explorer, add broad-except and reimported disables in agent_sim_learning_approach. - mypy: fix base/env variable name collision, add type: ignore on lambda inference, add return type annotations to GT factory methods.
Use utils.abstract to evaluate expected atoms in low-level search so that DerivedPredicates — which require a Set[GroundAtom] rather than a State — are handled correctly alongside regular predicates.
When sequential simulate calls differ only in process features (as in the combined kinematic+learned simulator), reapplying joint positions and tearing down/recreating grasp constraints causes visible arm jitter. Compare robot poses first and skip the kinematic reset path when they already match.
Factor simulator synthesis into a shared _learn_simulator helper so that both learn_from_offline_dataset and learn_from_interaction_results can trigger it on their respective trajectory sources. Also create a separate headless env for parameter fitting so MCMC's thousands of _set_state calls don't thrash the GUI env during training.
Converts _build_combined_simulator to an instance method so it can capture self, recreate the base env on pybullet.error, and retry once. Also catches pybullet.error in the oracle option model alongside OptionExecutionFailure. Updates agents.yaml config for testing.
… moves Switch the fitting loss from per-feature MSE to total SSE (drop the /count in compute_sse) so the Gaussian log-likelihood -0.5*SSE/sigma^2 is in its correct iid form. The previous MSE form silently rescaled per-observation noise by sqrt(count), making walker proposals indistinguishable from each other. Pair this with a wider walker initialization (0.5 * prior_sigma instead of 1% of init_value) so the swarm covers the prior support and emcee stretch moves can actually explore.
Unifies oracle and agent-synthesized simulators behind one loader: read_simulator_components pulls PROCESS_RULES, PARAM_SPECS, and PROCESS_FEATURES out of any namespace (module dict for oracle, exec_ns for agent), and get_gt_simulator now returns the triple including features. merge_updates no longer takes process_features since the rule producer owns that scope.
Replaces hard ``dist < threshold`` indicators in the boil rules with sigmoid-smoothed gates of width ``_SOFT_EPS``. Without smoothing, the LM finite-difference Jacobian is ~zero almost everywhere, and the Hessian identifiability diagnostic is uninformative; emcee also gets a non-flat likelihood as a side effect. State-dependent gates (faucet on/off, jug held) stay hard since they don't enter the parameter likelihood.
Adds fit_map_lm (Levenberg-Marquardt MAP estimate via SciPy TRF) and log_hessian_identifiability (eigendecompose J^T J/sigma^2 + prior precision to flag sloppy parameter directions). Both run as a single LM pass before MCMC; fit_params now centers walkers on theta_map when code_sim_learning_warm_start_with_lm is set, and short-circuits to it directly when num_mcmc_steps == 0. Also adds compute_residuals (per-feature residual vector LM consumes) and log_sse_breakdown (per-(type, feature) SSE so we can see which features dominate the loss). Two CFG flags gate the new behavior: warm_start_with_lm (default True), log_hessian_identifiability (default False).
The agent now declares its own PROCESS_FEATURES alongside PROCESS_RULES and PARAM_SPECS, and the loss is scoped to that declaration (instead of every feature on every type). Before synthesis, the approach runs the base sim on each transition and flags (type, feat) pairs whose prediction diverges from the observation on at least min_hits triples; this set is sent to the agent as a starting hint and used as the eval/test scope until the agent overrides it. The base-sim prediction is precomputed once into base_pred_triples so MCMC's inner loop only evaluates the cheap apply_rules step. create_synthesis_tools now takes the precomputed triples plus the inferred hint, drops the live base_env, and reads PROCESS_FEATURES from exec_ns each call (falling back to the hint when undeclared).
LM warm start alone matches the parameter fit for the current boil oracle program; emcee's MAP-of-walkers cannot improve on it in the time budgeted for 500 steps and routinely lands at higher SSE. Setting num_mcmc_steps to 0 and enabling warm_start_with_lm returns the LM theta_map directly.
Cleans up line-wrap and docstring drift across the sim-learning branch so the autoformat CI check is satisfied. Bundles the formatting-only changes for cogman, pybullet_boil, and utils that earlier branch commits left behind, plus minor wraps across the new sim-learning code.
``BaseEnv`` doesn't declare ``_physics_client_id`` (only PyBullet subclasses do), and ``_recreate_base_env`` reads it best-effort inside a try block. Bind to a local with type:ignore so mypy stops flagging the access without affecting runtime.
The simulator callback signature must match StepSimulatorFn's (state, action, params) shape even though apply_rules doesn't use the action. Renaming to _action signals intent and silences pylint's unused-argument check.
Replace the all-or-nothing kinematic-match gate with a per-component diff: robot pose, each object pose, and held-object identity are each compared against the live PyBullet world and only re-written when they actually differ. _robot_matches_state now compares at the joint level (the prior EE-quaternion path hard-coded roll=0, which spuriously mismatched whenever the wrist had any roll and forced a full reset on every simulate() call). reset_state honors caller-provided joint_positions only when they reconstruct the requested EE pose, falling back to IK otherwise.
_remake_cups creates fresh PyBullet bodies that need to be teleported to their state-specified poses; the per-component diff in _set_state now skips objects whose pose already matches PyBullet, so the explicit _reset_single_object calls ensure freshly-recreated bodies land in the right place. Same treatment for plugs when coffee_machine_has_plug.
The lambda used to capture predicates at __init__ time, which missed predicates invented later (grammar search) and broke subclasses whose _get_current_predicates depends on attributes not yet set during super().__init__().
Terminology cleanup to match how skip_process_dynamics is described elsewhere; the env wraps the full base sim, not just kinematics.
The fast-path joint-match check used atol=1e-2, which let a caller's initial_joint_positions hint be silently treated as "already there" when live joints were within 1e-2 of initial — leaving the EE pose up to ~3e-3 off the requested state. State.allclose compares features at 1e-3, so the test then failed reconstruction. Match the State.allclose tolerance. Also pick up trailing yapf reformatting in two approach files.
…st-split Both tests pass on master and in isolation but fail on shards 6/8 of CI on this branch. The branch's new tests shifted pytest-split's least_duration distribution so existing tests landed in different shards than on master, exposing pre-existing fragility: - test_glib_explorer[Holding]: score_fn returned 0 (not -inf) for non-target goals, so they weren't filtered. With cover's 7-atom dynamic universe and 10 babbles, ~3.5% of seeds sample no Holding goal and the explorer falls through to a Covers goal, leaving the final state without Holding. Bumped glib_num_babbles to 100 and switched the test's score_fn to return -inf for non-target so the explorer never plans toward an off-target predicate. - test_demo_dataset_loading[10-True-oracle-...]: _ensure_cover_demo_ data_exists only checked file existence. test_demo_dataset's max_initial_demos block writes a 3-trajectory dataset under the cover__demo__oracle__7__... name; the [10-...] case then loaded 3 + generated 3 = 6, expected 10. Added a trajectory-count check so the helper regenerates partial files.
…ects - test_robot_matches_state_atol_forces_reset_on_small_drift: locks in the 1e-3 atol regression. A ~5e-3 joint drift (within the previous 1e-2 tolerance, outside the new 1e-3) must NOT be treated as "already there" by the fast-path; _set_state must move the robot back to the requested EE pose at State.allclose precision. - tests/pybullet_helpers/test_objects.py (new): coverage for sample_collision_free_2d_positions, used by 3 PyBullet envs but previously without direct tests. Covers no-overlap (circles and rectangles), bounds, reproducibility across seeds, RuntimeError on impossible packing, and ValueError on unknown shape_type.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary