-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture the geospatial coverage of the data resource #1
Comments
We could consider whether it is sufficient to reuse the spatial geographies defined by ONS, e.g. to say something has full UK coverage then you could use the URI http://statistics.data.gov.uk/id/statistical-geography/K02000001 or for Greater Manchester use http://statistics.data.gov.uk/id/statistical-geography/E11000001. |
We can look at the way that the ONS GSS-Cogs have captured geospatial coverage, see their guidance. |
The range of dcterms:spatial is 0 or more dcterms:Location defined as "A spatial region or named place." (https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/Location/) The EU's GeoDCAT Application Profile is in the process of being "standardised" /adopted by the Open Geospatial Consortium. I expect it will continue to suggest that a Location is populated with either:
https://semiceu.github.io/GeoDCAT-AP/drafts/latest/#properties-for-location At present, most "geospatial" metadata records (e.g. in data.gov.uk) use a bounding box, in spite of its known weaknesses for 'locating' where data is about. You can see a GEMINI - DCAT mapping at agiorguk/gemini#41; it was created in a Geospatial Commission funded project (although largely based on a W3C one, given that GEMINI is based on an ISO standard). |
Regarding using ONS GSS to specify "where" data is about it rather depends on whether that would be taken to assert that it's "about the whole GSS" rather than just "located in the GSS". Simple example: a list of all the trees in New Milton parish vs a list of the trees in my garden (which happens to lie within New Milton parish). This isn't a purely ONS GSS question, it's an ambiguity when giving the "location" of data, but could be amplified if that location is expressed in terms of a formally defined "place" (whether statistical or administrative geography). |
Good point @PeterParslow! At ONS we make use of GSS geography codes wherever we can, and I imagine if a local authority were publishing datasets about their administrative area that they would be well served by using the GSS codes too. I guess the spirit of the So for a dataset of trees in New Milton - I'd probably use the GSS identifier for New Milton, but for a dataset of trees in my garden, I'd draw a geometry of my garden and provide that. The GSS codes have essentially been translated into linked data with a similar structure to what DCAT is recommending (but making use of the <https://data.gov.uk/datasets/example> a dcat:Dataset ;
dcterms:spatial <http://statistics.data.gov.uk/id/statistical-geography/K02000001> ;
.
<http://statistics.data.gov.uk/id/statistical-geography/K02000001> a dcterms:Location ;
geosparql:hasGeometry <http://statistics.data.gov.uk/id/statistical-geography/K02000001/geometry> ;
.
<http://statistics.data.gov.uk/id/statistical-geography/K02000001/geometry> a geosparql:Geometry ;
geosparql:asWKT """MULTIPOLYGON (((...)))"""^^geosparql:wktLiteral ;
. DCAT has a good section on the use of
So we end up with some examples like this involving geometries, bboxes and centroids. <https://data.gov.uk/datasets/example> a dcat:Dataset ;
dcterms:spatial [
a dcterms:Location ;
locn:geometry """POLYGON ((
4.8842353 52.375108 , 4.884276 52.375153 ,
4.8842567 52.375159 , 4.883981 52.375254 ,
4.8838502 52.375109 , 4.883819 52.375075 ,
4.8841037 52.374979 , 4.884143 52.374965 ,
4.8842069 52.375035 , 4.884263 52.375016 ,
4.8843200 52.374996 , 4.884255 52.374926 ,
4.8843289 52.374901 , 4.884451 52.375034 ,
4.8842353 52.375108
))"""^^geosparql:wktLiteral ;
] . <https://data.gov.uk/datasets/example> a dcat:Dataset ;
dcterms:spatial [
a dcterms:Location ;
dcat:bbox """POLYGON((
3.053 47.975 , 7.24 47.975 ,
7.24 53.504 , 3.053 53.504 ,
3.053 47.975
))"""^^geosparql:wktLiteral ;
] . <https://data.gov.uk/datasets/example> a dcat:Dataset ;
dcterms:spatial [
a dcterms:Location ;
dcat:centroid "POINT(4.88412 52.37509)"^^geosparql:wktLiteral ;
] . |
Thanks for that @rossbowen ; I think it pretty much answers my action to provide examples of the three approaches! Note: in my experience, geo data people don't use "location by centroid" in metadata. I also like your explanation of when to use a controlled identifier. I think it may need to go a bit further, with GSS identifiers being appropriate for statistical areas with other 'controlled lists' better for administrative areas (e.g. national parks). GeoDCAT adds a "location as a geographic name" example, given as: a dct:Location, I'm sure you could provide a more "UK" example (e.g. using a GSS). |
In the meeting I took an action to provide examples. The software we use for the OS Data Catalogue(also used at Defra, EA, BGS, Scottish government) provides its RDF output in RDF/XML. These examples are from there, so may be more useful to some readers & less useful to others.... I also am not in a position to verify that it is "good RDF"; I do notice it doesn't include 'location by keyword' in the RDF, and I don't have any example other than 'by bounding box'.
dcat:Dataset |
Are you able to provide the link to the full RDF representation? That snippet is not valid RDF/XML (I've had similar problems in the past with OGC generated RDF). |
I gave the link to the Data Catalogue from which I downloaded a file & snipped out that dct:spatial bit. I think it was the file you can get from https://osmetadata.astuntechnology.com/geonetwork/srv/eng/catalog.search#/metadata/eaaad50e-0fa9-40be-84b5-d11740297320 It's generated by a widely used piece of open source software (Geonetwork Open Source), so if you can clearly what makes it invalid we can raise a request to fix it (although we will move to a newer version soon, so it may have been fixed already). |
Thanks for the link. I downloaded the whole RDF representation and ran it through the validator. Unfortunately it is not valid RDF/XML I've seen this before with the Geonetwork output and had reported it geonetwork/core-geonetwork#7332. Although the issue was closed it was not fixed. |
My XML validator (Oxygen) reports the same. It took me a few minutes find how Oxygen decides to validate the file; it has "built in knowledge" of a schema for the http://www.w3.org/ns/dcat# namespace, a RELAX NG Compact Schema "based on one originally written by James Clark in # http://lists.w3.org/Archives/Public/www-rdf-comments/2001JulSep/0248.html". I've never opened a Relax NG file before. What it is complaining about is a dct:license which has both a link "rdf:resource" and content. This is a shame because it is a very common XML pattern that is widely used in geospatial metadata. Other examples include keywords that both link to the authoritative register entry for the keyword & include the keyword (perhaps in a different language) locally for ease of use. Regarding the issue you raised on core GeoNetwork, the comments suggest it has been moved because it relates to a specific GeoNetwork plug in. I'm not qualified to know if that's true, but you can see the issue still open at AstunTechnology/iso19139.gemini23#146 You can see a related discussion at https://www.w3.org/2011/gld/track/issues/60, highlighting the desire to supplement the "license you link to" with some literal text. But the "solution" at Dublin Core appears to be to use "rights" with a sub item of "license" for the link? But that doesn't seem to capture the idea of a "short name + link" as in the example I provided. In hmtl that would be an anchor with a title. Could you suggest (to Jo at the Astun GEMINI plugin issue? how this use case could be handled in RDF? I have only a small surface knowledge of RDF. |
RDF/XML places additional constraints on the XML document. So something can be a valid XML document using only the terms defined in the RDF namespace but not be a valid RDF model. Note that the rest of the RDF/XML document may contain further errors. RDF does not support labels on edges or a single edge pointing to a literal and a resource at the same time. The way to add a label to the resource would be to add an edge from the object resource. PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
<https://osmetadata.astuntechnology.com/geonetwork/srv/resources/datasets/OS+1:50+000+Scale+Colour+Raster>
a dcat:Dataset ;
dct:license "Use limitation dependent upon licence" ;
dct:license <http://www.ordnancesurvey.co.uk/oswebsite/business/licences/index.html> .
<http://www.ordnancesurvey.co.uk/oswebsite/business/licences/index.html>
rdfs:comment "Licences and agreements explained" . |
Thanks Alasdair. The problem I see with your proposal is that it appears to say that the Dataset has two licenses, rather than two statements about the same external object (which in this case may actual fail the DCT criteria to be called a licence, but lets put that to one side for now!). |
It was unclear to me from the XML modelling what was meant since there were two statements regarding the license. If the two text sentences are meant to be about the same license then that can also be stated in the RDF. PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
<https://osmetadata.astuntechnology.com/geonetwork/srv/resources/datasets/OS+1:50+000+Scale+Colour+Raster>
a dcat:Dataset ;
dct:license <http://www.ordnancesurvey.co.uk/oswebsite/business/licences/index.html> .
<http://www.ordnancesurvey.co.uk/oswebsite/business/licences/index.html>
rdfs:label "Use limitation dependent upon licence" ;
rdfs:comment "Licences and agreements explained" . The predicates |
Thanks Alasdair. That seems closer to the original intent, where the "two statements" were in the same XML element (which as you pointed out is not allowable in RDF/XML). All this has rather diverted from trying to show how the three ways to state geographical coverage would look. |
I have manually adjusted the RDF/XML file that I linked to above. I hope my adjustments are in line with Alasdair's input. The file now validates at https://www.w3.org/RDF/Validator/rdfval. I have also changed the extension from .rdf to .txt in order to attach it in GitHub. (Personally, I would put the namespace declarations at the top, but I have tried to minimise |
Turns out I was using an old version of Geonetwork. The current version provides DCAT in RDF that looks much cleaner to me, and validates at W3C |
Extend the metadata model to enable the specification of the geospatial coverage and resolution of the data asset. The extension must be compliant with GEMINI (GitHub).
DCAT includes the following properties for capturing geospatial coverage and resolution:
dcterms:spatial
dcat:spatialResolutionInMeters
The text was updated successfully, but these errors were encountered: