Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This commits adds a 'wikirace" task, the idea is this:
You are currently on wikipedia (English) page X. You are provided with the list of pages you can go to from page X. Your goal is to go to page Y with the shortest path. (Note that there might be multiple shortest path).
This is a multi-turn game, but I implemented it in single-turn: We provide a part of the shortest path to the LLM [1].
I tried running pre-commit, but I guess it was looking for a python3.11? Which I don't currently have, I'll look at running the pre-commit any way, so you might consider this PR as work-in-progress.
I guess the biggest issue with this PR is the dependency on the datasets python library. The dataset is 62MB.
What's the recommendation there? Should I commit this database as a sqlite somewhere?
I've chosen rewards as:
I have no idea what "levels" parameter of RangeAttributeDefinition are supposed to mean, so it just has the values I copy/pasted from somewhere else.
[1] In this current PR, the length of that prefix path isn't configurable, while it has a big influence on the difficulty
On Gemini 2.5 Pro (Free instance available on openrouter), I get 44% with size=10, seed = 45