Add Reinforcement Learning (RL) Sandbox Example

### Summary
Add a new example demonstrating how to use OpenSandbox for Reinforcement Learning scenarios. This would showcase how RL training environments can run safely in isolated sandboxes, especially useful for RL agents that need to interact with external systems or execute arbitrary code.

### Motivation
Reinforcement Learning often involves:
- Executing potentially unsafe code generated by RL agents
- Running long-duration training sessions that need isolation
- Interacting with environments that may have side effects
- Needing reproducible and isolated environments for training

OpenSandbox provides an ideal platform for these use cases, but currently lacks a dedicated RL example.

### Proposed Example Features

The example should include:

1. **Basic RL Environment Setup**
   - Integration with popular RL frameworks (e.g., Gymnasium, Stable-Baselines3)
   - Simple environment like CartPole or custom OpenAI Gym environment
   
2. **Sandbox Integration**
   - Create sandboxed training environment
   - Execute training loops within the sandbox
   - Handle episode data and model checkpointing
   
3. **Code Execution Safety**
   - Demonstrate executing RL agent code safely
   - Show how to isolate potentially unsafe policy networks
   
4. **Example Structure** (following existing patterns):
   ```
   examples/rl-training/
   ├── README.md           # Setup and usage instructions
   ├── main.py            # Main example code
   ├── requirements.txt   # Dependencies (gymnasium, stable-baselines3, etc.)
   ├── Dockerfile         # Optional containerized version
   └── screenshot.png     # Example output/visualization
   ```

5. **Documentation**
   - Setup instructions
   - How to run training
   - How to visualize results
   - Integration with TensorBoard or other monitoring tools

### Example Use Cases to Demonstrate

- Training a simple RL agent (e.g., DQN on CartPole)
- Safe execution of learned policies
- Multi-episode training with state persistence
- Integration with code-interpreter for result analysis

### Acceptance Criteria

- [ ] Working example code in `examples/rl-training/`
- [ ] Clear README with setup and usage instructions
- [ ] Demonstrates sandbox creation and RL training loop
- [ ] Follows the structure and style of existing examples
- [ ] Includes dependencies in requirements.txt
- [ ] Successfully runs end-to-end training

### Related

This would complement existing examples like:
- code-interpreter (for executing training code)
- claude-code (for AI-generated RL agent code)
- desktop (for visualizing training results)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Reinforcement Learning (RL) Sandbox Example #29

Summary

Motivation

Proposed Example Features

Example Use Cases to Demonstrate

Acceptance Criteria

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Reinforcement Learning (RL) Sandbox Example #29

Description

Summary

Motivation

Proposed Example Features

Example Use Cases to Demonstrate

Acceptance Criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions