-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Description
Hi,
I recently downloaded liveothello (11k games) and wthor (132k games) and noticed that all wthor transcripts start with the move f5. Once taking symmetries into account (there are 4 symmetries in Othello), the overlap between the 2 datasets is 8k games (72% of liveothello is in wthor). Without symmetries the overlap is 3k (27%).
The paper mentions
They [wthor and liveothello games] are combined and split randomly by 8 : 2 into training and validation sets
Hence I think there is a small data leakage between the training and validation set (x4 larger if you take symmetries into account).
jfpuget
Metadata
Metadata
Assignees
Labels
No labels