Skip to content

Conversation

@phhusson
Copy link

This commits adds a 'wikirace" task, the idea is this:
You are currently on wikipedia (English) page X. You are provided with the list of pages you can go to from page X. Your goal is to go to page Y with the shortest path. (Note that there might be multiple shortest path).

This is a multi-turn game, but I implemented it in single-turn: We provide a part of the shortest path to the LLM [1].

I tried running pre-commit, but I guess it was looking for a python3.11? Which I don't currently have, I'll look at running the pre-commit any way, so you might consider this PR as work-in-progress.

I guess the biggest issue with this PR is the dependency on the datasets python library. The dataset is 62MB.
What's the recommendation there? Should I commit this database as a sqlite somewhere?

I've chosen rewards as:

  • 1.0 means that the LLM chose to go to a shortest path
  • 0.5 means that the chosen direction isn't the shortest, but only does +1 on the shortest length using that path
  • 0.1 means that it managed to do +2 on the shortest length using that path
  • 0.01 means that the output is invalid

I have no idea what "levels" parameter of RangeAttributeDefinition are supposed to mean, so it just has the values I copy/pasted from somewhere else.

[1] In this current PR, the length of that prefix path isn't configurable, while it has a big influence on the difficulty

On Gemini 2.5 Pro (Free instance available on openrouter), I get 44% with size=10, seed = 45

Copy link
Collaborator

@joesharratt1229 joesharratt1229 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it will be really useful environment, just a few corrections

@olliestanley olliestanley requested a review from zafstojano June 25, 2025 12:28
@@ -0,0 +1 @@
datasets>=3.6.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a requirements-optional.txt we can add a section in pyproject.toml containing this dep, as we already do for some other optional requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants