-
Notifications
You must be signed in to change notification settings - Fork 88
[ENVIRONMENT] textarena wrapper env #99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…into textarena-env
pankit-eng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would like this env to be built as part of the repo build pipeline, then please update the .github/workflows/docker_build file to ensure it gets built the same way as other envs.
|
Thanks for the review @pankit-eng . I updated the env to use the docker image and heard about consistency. |
|
Hi @burtenshaw, I've been browsing through the PR and I had a quick question about the Wordle-specific reward signals you added: since the wrapper is designed to work with any |
|
@rycerzes your feedback makes total sense. when I implemented this I wasn't sure about the consensus on sharing rewards for convenience vs many envs. in the last week, it seems like we're going for the latter. I'll open a new PR. |
This environment wraps the library textarena with an OpenEnv server. This allows users to do RL on gams like wordle, and many more.
The PR contains:
gpt-oss-120bOpen to feedback if it doesn't conform to any library norms.
The environment is deployed here
And I used it to train a model with GRPO in TRL here
Here's an example from a wordle inference: