Skip to content

Conversation

@burtenshaw
Copy link
Collaborator

@burtenshaw burtenshaw commented Oct 26, 2025

This environment wraps the library textarena with an OpenEnv server. This allows users to do RL on gams like wordle, and many more.

The PR contains:

  • a server implementation
  • DockerFile
  • basic docs in readme
  • a hello world example script
  • an inference example using gpt-oss-120b

Open to feedback if it doesn't conform to any library norms.

The environment is deployed here
And I used it to train a model with GRPO in TRL here

Here's an example from a wordle inference:

🎯 Turn 5: model replied with -> [train]
   Parsed guess: [train]
   Feedback messages:
     [MESSAGE] [train]
     [MESSAGE] Player 0 submitted [train].
Feedback:
A R I S E
Y G Y X X

G R A I N
X G G G G

A R G U E
Y G X X X

B R A I N
X G G G G

T R A I N
X G G G G

You have 1 guesses left.

🎯 Turn 6: model replied with -> [drain]
   Parsed guess: [drain]
   Feedback messages:
     [MESSAGE] [drain]
     [MESSAGE] Player 0 won the game. Reason: Player 0 guessed the word correctly!
     [MESSAGE] The game ended in a draw. Reason: Turn limit reached.

✅ Game finished
   Reward: 0.0
   Done: True

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 26, 2025
@burtenshaw burtenshaw marked this pull request as ready for review October 26, 2025 19:09
@jspisak jspisak added New Environment enhancement New feature or request labels Oct 27, 2025
Copy link
Contributor

@pankit-eng pankit-eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would like this env to be built as part of the repo build pipeline, then please update the .github/workflows/docker_build file to ensure it gets built the same way as other envs.

@burtenshaw
Copy link
Collaborator Author

Thanks for the review @pankit-eng . I updated the env to use the docker image and heard about consistency.

@rycerzes
Copy link

Hi @burtenshaw, I've been browsing through the PR and I had a quick question about the Wordle-specific reward signals you added: since the wrapper is designed to work with any TextArena game, I'm wondering if those auxiliary rewards (greens/yellows/etc.) are better kept in the environment for convenience, or if they should be handled in training code to keep things more modular. I think the wrapper should remain fully remain generic like other environments only providing the main reward value without any auxiliary/shaped reward signals computed in the environment.

@pankit-eng pankit-eng merged commit dcbc4af into meta-pytorch:main Oct 31, 2025
1 check passed
@burtenshaw
Copy link
Collaborator Author

@rycerzes your feedback makes total sense. when I implemented this I wasn't sure about the consensus on sharing rewards for convenience vs many envs. in the last week, it seems like we're going for the latter. I'll open a new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. enhancement New feature or request New Environment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants