Cosmopedia

This week we will discuss Cosmopedia, the 35 billion token synthetic textbooks, blogposts, stories, posts, and WikiHow articles dataset generated by Mixtral-8x7B-Instruct-v0.1.