We gathered corpus suitable for data mining and language modeling.
Most of dataset pushed at https://huggingface.co/datasets/malaysia-ai/pretrain-text-dataset, which the source code at ../crawl.
Name | Name | Last commit date | ||
---|---|---|---|---|
parent directory.. | ||||
We gathered corpus suitable for data mining and language modeling.
Most of dataset pushed at https://huggingface.co/datasets/malaysia-ai/pretrain-text-dataset, which the source code at ../crawl.