Hi, thank you for releasing the WebLINX dataset!
I noticed that the reranking subset on HuggingFace uses a different textual format for the query field compared to the raw data in the main WebLINX dataset or the candidates/*.jsonl files. In particular, the reranking queries are simplified “User / Agent” dialogue-style texts.
Could you clarify how the original WebLINX turn (or the query field in candidates/*.jsonl) is transformed into the query used in the HuggingFace reranking split? Is there a specific preprocessing script or template that was used to generate these reranking-style queries?
Thanks!