Skip to content

Add Reinforcement Learning (RL) Sandbox Example #29

@jwx0925

Description

@jwx0925

Summary

Add a new example demonstrating how to use OpenSandbox for Reinforcement Learning scenarios. This would showcase how RL training environments can run safely in isolated sandboxes, especially useful for RL agents that need to interact with external systems or execute arbitrary code.

Motivation

Reinforcement Learning often involves:

  • Executing potentially unsafe code generated by RL agents
  • Running long-duration training sessions that need isolation
  • Interacting with environments that may have side effects
  • Needing reproducible and isolated environments for training

OpenSandbox provides an ideal platform for these use cases, but currently lacks a dedicated RL example.

Proposed Example Features

The example should include:

  1. Basic RL Environment Setup

    • Integration with popular RL frameworks (e.g., Gymnasium, Stable-Baselines3)
    • Simple environment like CartPole or custom OpenAI Gym environment
  2. Sandbox Integration

    • Create sandboxed training environment
    • Execute training loops within the sandbox
    • Handle episode data and model checkpointing
  3. Code Execution Safety

    • Demonstrate executing RL agent code safely
    • Show how to isolate potentially unsafe policy networks
  4. Example Structure (following existing patterns):

    examples/rl-training/
    ├── README.md           # Setup and usage instructions
    ├── main.py            # Main example code
    ├── requirements.txt   # Dependencies (gymnasium, stable-baselines3, etc.)
    ├── Dockerfile         # Optional containerized version
    └── screenshot.png     # Example output/visualization
    
  5. Documentation

    • Setup instructions
    • How to run training
    • How to visualize results
    • Integration with TensorBoard or other monitoring tools

Example Use Cases to Demonstrate

  • Training a simple RL agent (e.g., DQN on CartPole)
  • Safe execution of learned policies
  • Multi-episode training with state persistence
  • Integration with code-interpreter for result analysis

Acceptance Criteria

  • Working example code in examples/rl-training/
  • Clear README with setup and usage instructions
  • Demonstrates sandbox creation and RL training loop
  • Follows the structure and style of existing examples
  • Includes dependencies in requirements.txt
  • Successfully runs end-to-end training

Related

This would complement existing examples like:

  • code-interpreter (for executing training code)
  • claude-code (for AI-generated RL agent code)
  • desktop (for visualizing training results)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions