How is the WebLINX raw query converted to the Reranking dataset format?

Hi, thank you for releasing the WebLINX dataset!

I noticed that the reranking subset on HuggingFace uses a different textual format for the query field compared to the raw data in the main WebLINX dataset or the candidates/*.jsonl files. In particular, the reranking queries are simplified “User / Agent” dialogue-style texts.

Could you clarify how the original WebLINX turn (or the query field in candidates/*.jsonl) is transformed into the query used in the HuggingFace reranking split? Is there a specific preprocessing script or template that was used to generate these reranking-style queries?

Thanks！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the WebLINX raw query converted to the Reranking dataset format? #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How is the WebLINX raw query converted to the Reranking dataset format? #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions