Skip to content

Ensure tool calls and reserved names aren't overwritten + add logger metadata#109

Merged
alexzhang13 merged 2 commits into
mainfrom
adapting_to_verifiers
Feb 16, 2026
Merged

Ensure tool calls and reserved names aren't overwritten + add logger metadata#109
alexzhang13 merged 2 commits into
mainfrom
adapting_to_verifiers

Conversation

@alexzhang13

Copy link
Copy Markdown
Owner

1. Scaffold / namespace protection. From #95

  • Problem: Executed code could overwrite context, llm_query, FINAL_VAR, etc., and break RLM.
  • Fix: Single set RESERVED_TOOL_NAMES in base_env (includes history). After each run:
    • LocalREPL: _restore_scaffold() restores globals and context/history from context_0/history_0.
    • Isolated envs (modal, prime, e2b, daytona, docker): in exec script, restore context and history from context_0/history_0 before save_state(_locals).
  • Tests: TestLocalREPLScaffoldRestoration in test_local_repl.py.

2. Trajectory metadata on RLMChatCompletion

  • RLMChatCompletion.metadata: Optional dict | None (default None) holding full trajectory (run_metadata + iterations with code blocks and sub-calls) so runs can be reconstructed.
  • RLMLogger:
    • Always captures in memory. log_dir=None → only in-memory (for completion.metadata). log_dir="path" → same + write JSONL to disk.
    • clear_iterations() at start of each completion; get_trajectory() returns {run_metadata, iterations} for the current completion.
  • RLM: Calls logger.clear_iterations() at start of completion(); sets metadata=logger.get_trajectory() on returned RLMChatCompletion.
  • README: short “Trajectory metadata and logging” section (in-memory vs disk).
  • Tests: TestRLMChatCompletion in test_types.py (metadata default + roundtrip).

@alexzhang13 alexzhang13 merged commit bcb9b25 into main Feb 16, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant