Skip to content

Croissant tag missing in some Croissant supported datasets #3135

@fylux

Description

@fylux

For example the following dataset:

https://huggingface.co/datasets/allenai/c4

Lacks a Croissant tag, not just in the UI but also if filtering by "library:mlcroissant" with the API. However, the Croissant file is available in the API:

https://huggingface.co/api/datasets/allenai/c4/croissant

When looking at the 15k most download HF datasets, around 4k were lacking this tag. Sometimes this might be justified due to a faulty DatasetInfo, but that's not always the case as we have seen with allenai/c4.

fyi @lhoestq

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions