-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Summary
Add a new example demonstrating how to use OpenSandbox for Reinforcement Learning scenarios. This would showcase how RL training environments can run safely in isolated sandboxes, especially useful for RL agents that need to interact with external systems or execute arbitrary code.
Motivation
Reinforcement Learning often involves:
- Executing potentially unsafe code generated by RL agents
- Running long-duration training sessions that need isolation
- Interacting with environments that may have side effects
- Needing reproducible and isolated environments for training
OpenSandbox provides an ideal platform for these use cases, but currently lacks a dedicated RL example.
Proposed Example Features
The example should include:
-
Basic RL Environment Setup
- Integration with popular RL frameworks (e.g., Gymnasium, Stable-Baselines3)
- Simple environment like CartPole or custom OpenAI Gym environment
-
Sandbox Integration
- Create sandboxed training environment
- Execute training loops within the sandbox
- Handle episode data and model checkpointing
-
Code Execution Safety
- Demonstrate executing RL agent code safely
- Show how to isolate potentially unsafe policy networks
-
Example Structure (following existing patterns):
examples/rl-training/ ├── README.md # Setup and usage instructions ├── main.py # Main example code ├── requirements.txt # Dependencies (gymnasium, stable-baselines3, etc.) ├── Dockerfile # Optional containerized version └── screenshot.png # Example output/visualization -
Documentation
- Setup instructions
- How to run training
- How to visualize results
- Integration with TensorBoard or other monitoring tools
Example Use Cases to Demonstrate
- Training a simple RL agent (e.g., DQN on CartPole)
- Safe execution of learned policies
- Multi-episode training with state persistence
- Integration with code-interpreter for result analysis
Acceptance Criteria
- Working example code in
examples/rl-training/ - Clear README with setup and usage instructions
- Demonstrates sandbox creation and RL training loop
- Follows the structure and style of existing examples
- Includes dependencies in requirements.txt
- Successfully runs end-to-end training
Related
This would complement existing examples like:
- code-interpreter (for executing training code)
- claude-code (for AI-generated RL agent code)
- desktop (for visualizing training results)
Metadata
Metadata
Assignees
Labels
No labels