An OpenEnv benchmark testing the ability of AI agents to act as Site Reliability Engineers (SREs) by diagnosing and filtering raw production failure logs.
-
Updated
Apr 8, 2026 - Python
An OpenEnv benchmark testing the ability of AI agents to act as Site Reliability Engineers (SREs) by diagnosing and filtering raw production failure logs.
A highly-realistic customer support environment conforming to the OpenEnv specification. Designed to rigorously test LLM instruction following, safety boundaries, and tool use logic in a real-world SaaS setting.
SafetyGuard Arena v3.0 — OpenEnv RL Safety Gym for adversarial stress-testing LLMs. Features Basilisk Adaptive Red-Teamer, PPO training pipeline, one-click HF dataset export, and **Flagship Multi-Format Encoded Query System** (binary, hex, base64 + De-obfuscation Engine). Built for Meta, Hugging Face, and AI safety teams.
MindFlayer — Reinforcement Learning Environment for Emergent Deceptive Behavior in LLM Agents
OpenEnv Framework based Multi-Agent Planning RL Environment - For Meta Pytorch OpenEnv Hackathon
Top 100, Meta PyTorch OpenEnv Hackathon 2026: An OpenEnv-based reinforcement learning environment for training AI agents in Zero Trust cybersecurity workflows. Built using OpenEnv, LLM-based agents, and CMDP constraints to enforce enterprise policy compliance.
NeoVentEnv: An OpenEnv neonatal ventilator management simulator for training and evaluating RL/LLM agents on realistic NICU tasks.
A real-world RL environment where AI agents learn to maintain and update test suites when code changes. Includes tasks for unit testing, bug detection, and regression auditing with structured reward signals.
An RL project using OpenEnv to train agents in a long-horizon orbital control environment, focusing on fuel-efficient decision-making, anomaly recovery, and multi-phase mission planning. Made for OpenEnv Hackathon
Government Scheme Eligibility Matching - OpenEnv Environment
OpenEnv code review environment for AI agents.
Fault-injecting OpenEnv training environment for vibe-coded SaaS incidents. 30 scenarios grounded in 2025-26 production failures. Drop-in OpenClaw-RL pool server. Claude Code skill included.
RL model for citations | Scalar x Meta Openenv Hackathon
TokenOptEnv is an OpenEnv environment for CAMRE: a cost-aware meta-reasoning benchmark for code and log tasks. Agents do not just solve tasks. They also learn how to manage context, retrieval, memory, checkpoints, compression, and model-routing strategy under explicit token and cost budgets.
An OpenEnv environment where AI agents triage satellite intelligence reports, classify threats, and make real-time defense decisions.
Production MLOps operations environment for RL agent training. 3 tasks: data quality triage, deployment decisions, incident cascade. Dense rewards, causal state transitions, deterministic graders. Built for OpenEnv Hackathon (Meta × HuggingFace × Scaler).
Add a description, image, and links to the openenv-hackathon topic page so that developers can more easily learn about it.
To associate your repository with the openenv-hackathon topic, visit your repo's landing page and select "manage topics."