[ENVIRONMENT] textarena wrapper env #99

burtenshaw · 2025-10-26T19:08:32Z

This environment wraps the library textarena with an OpenEnv server. This allows users to do RL on gams like wordle, and many more.

The PR contains:

a server implementation
DockerFile
basic docs in readme
a hello world example script
an inference example using gpt-oss-120b

Open to feedback if it doesn't conform to any library norms.

The environment is deployed here
And I used it to train a model with GRPO in TRL here

Here's an example from a wordle inference:

🎯 Turn 5: model replied with -> [train]
   Parsed guess: [train]
   Feedback messages:
     [MESSAGE] [train]
     [MESSAGE] Player 0 submitted [train].
Feedback:
A R I S E
Y G Y X X

G R A I N
X G G G G

A R G U E
Y G X X X

B R A I N
X G G G G

T R A I N
X G G G G

You have 1 guesses left.

🎯 Turn 6: model replied with -> [drain]
   Parsed guess: [drain]
   Feedback messages:
     [MESSAGE] [drain]
     [MESSAGE] Player 0 won the game. Reason: Player 0 guessed the word correctly!
     [MESSAGE] The game ended in a draw. Reason: Turn limit reached.

✅ Game finished
   Reward: 0.0
   Done: True

…into textarena-env

pankit-eng

If you would like this env to be built as part of the repo build pipeline, then please update the .github/workflows/docker_build file to ensure it gets built the same way as other envs.

examples/textarena_simple.py

burtenshaw · 2025-10-27T18:13:19Z

Thanks for the review @pankit-eng . I updated the env to use the docker image and heard about consistency.

rycerzes · 2025-10-30T09:52:09Z

Hi @burtenshaw, I've been browsing through the PR and I had a quick question about the Wordle-specific reward signals you added: since the wrapper is designed to work with any TextArena game, I'm wondering if those auxiliary rewards (greens/yellows/etc.) are better kept in the environment for convenience, or if they should be handled in training code to keep things more modular. I think the wrapper should remain fully remain generic like other environments only providing the main reward value without any auxiliary/shaped reward signals computed in the environment.

burtenshaw · 2025-11-03T12:57:54Z

@rycerzes your feedback makes total sense. when I implemented this I wasn't sure about the consensus on sharing rewards for convenience vs many envs. in the last week, it seems like we're going for the latter. I'll open a new PR.

burtenshaw added 9 commits October 23, 2025 09:55

add simple try except to subprocess

a8fa54d

Merge branch 'expose-container-logs' into textarena-env

0807289

implement basic textarena wrapper server

bcecb3b

implement basic text arena client

952173e

add text arena examples and docs

bcc072d

logical failed prsing word

21f7ed3

first draft grpo script

87c17cc

Merge branch 'textarena-env' of https://github.com/burtenshaw/OpenEnv …

3085f42

…into textarena-env

rename inference example

b52e3c8

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 26, 2025

burtenshaw marked this pull request as ready for review October 26, 2025 19:09

HamidShojanazeri requested review from Darktex and pankit-eng October 27, 2025 04:09

burtenshaw added 2 commits October 27, 2025 06:51

fix typo in file name

3b72bce

add inference example with hf and gpt oss

09ad324

jspisak added New Environment enhancement New feature or request labels Oct 27, 2025

pankit-eng approved these changes Oct 27, 2025

View reviewed changes

examples/textarena_simple.py Outdated Show resolved Hide resolved

burtenshaw commented Oct 27, 2025

View reviewed changes

examples/textarena_simple.py Outdated Show resolved Hide resolved

Update examples/textarena_simple.py

67eec2c

burtenshaw added 3 commits October 27, 2025 20:46

add env to docker build

d3b10e8

delete extra grpo example

bc8204d

add wordle specific rewards to the environment

941da1a

pankit-eng merged commit dcbc4af into meta-pytorch:main Oct 31, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENVIRONMENT] textarena wrapper env #99

[ENVIRONMENT] textarena wrapper env #99

Uh oh!

burtenshaw commented Oct 26, 2025 •

edited

Loading

Uh oh!

pankit-eng left a comment

Uh oh!

Uh oh!

Uh oh!

burtenshaw commented Oct 27, 2025

Uh oh!

rycerzes commented Oct 30, 2025

Uh oh!

Uh oh!

burtenshaw commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[ENVIRONMENT] textarena wrapper env #99

[ENVIRONMENT] textarena wrapper env #99

Uh oh!

Conversation

burtenshaw commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pankit-eng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

burtenshaw commented Oct 27, 2025

Uh oh!

rycerzes commented Oct 30, 2025

Uh oh!

Uh oh!

burtenshaw commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

burtenshaw commented Oct 26, 2025 •

edited

Loading