Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture the schema of the underlying Dataset #4

Open
AlasdairGray opened this issue Apr 27, 2023 · 1 comment
Open

Capture the schema of the underlying Dataset #4

AlasdairGray opened this issue Apr 27, 2023 · 1 comment
Labels

Comments

@AlasdairGray
Copy link
Contributor

The schema of a Dataset helps a technical acquirer to understand and assess the data.

Extend the exchange model to capture the schema of the Dataset.

DCAT recommends using the dct:conformsTo property for capturing the schema of the Dataset, see §6.4.2 of DCATv3

@AlasdairGray
Copy link
Contributor Author

DCAT does not give any guidance on how to capture the schema of the underlying dataset. We will need to support a wide variety of dataset formats including CSV, JSON, geoJSON, and XML.

To enable applications such as the Data Marketplace to be able to exploit the schema level information, it would be beneficial to have an agreed approach, but this is likely to be different depending upon the dataset media type.

For tabular data there is a government recommendation to use CSV to share this data and also a recommendation to use CSVW (CSV for the Web) to capture the metadata. CSVW is a recommendation for sharing CSV files and is capable of modelling the column headings and relationships between them. This would allow for the use of CVSW processing tools to manipulate the metadata.

For XML and JSON, there exist XML Schema and JSON Schema respectively. These can be published on the web and the dct:conformsTo property could link to the file (or we could investigate embedding it within the metadata). The schema information can then be processed using standard tooling available in multiple languages. This approach also means that the metadata publisher will not need to do additional modelling for their schema level metadata.

@RobNicholsGDS RobNicholsGDS added new requirement discussion General discussion points labels Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants