Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DD3 R7. Where possible, add Schema.org “corresponding element” entries to GEMINI elements #41

Open
PeterParslow opened this issue Apr 7, 2021 · 27 comments
Assignees
Labels
DD3 Recommendation from the Geospatial Commission Data Discoverability 3 project Elements Issue that primarily affects the GEMINI elements enhancement New feature or request

Comments

@PeterParslow
Copy link
Contributor

We have used W3C’s recommendations for mapping from ISO 19115 to Schema.org. This table summarises the Schema.org equivalence statements given for each element below.
Whilst there is no specific DD2 recommendation concerning DCAT, we believe a DCAT2 “equivalent element” for each GEMINI element would be useful, by supporting those whose web publication of GEMINI records uses DCAT as opposed to Schema.org. Where this is easily available from the same W3C source, we have included this below. You will see that the two vocabularies are very similar, but note that:
• some of the DCAT elements sit in the DCAT “distribution” section, not their “dataset”;
• many DCAT properties have structured content, so this is not a complete list of how to implement it; and
• there are many other DCAT properties that should also be used, beyond those that exist in Schema.org (e.g. conformsTo, creator, spatialResolutionInMeters, format).

GEMINI element Condition Schema.org DCAT/DCAT2[1] Notes
Title name dct:title
Dataset language inLanguage dct:language
Abstract description dct:description
Topic category keywords dct:subject
Keyword INSPIRE theme keywords dcat:theme / dct:subject
Keyword free text keywords dcat:keyword Schema.org puts all the ‘free text’ keywords in one value
Keyword Controlled list, URL Keywords.DefinedTerm.name
Use .description for the textual content of the Anchor or CodeList
Use .url for the target of the Anchor
dcat:keyword.DefinedTerm
Temporal extent temporalCoverage[2] dct:temporal
Dataset reference date 19115 dateType = publication datePublished dct:issued release date / issued
Dataset reference date 19115 dateType = revised dateModified update date / dct:modified
Lineage dct:provenance
Extent spatialCoverage.Place.name dct:spatial
Resource locator.linkage 19115 function = download contentURL (inside “distribution”) dcat:downloadURL
Resource locator.linkage 19115 function = “information”
Where the page links on to download
dcat:accessURL
Resource locator.linkage 19115 function = “information” url dcat:landingPage
Data format encodingFormat dct:format, Possibly also dcat:mediaType
Responsible organisation 19115 role = publisher publisher.Organization (with at least name, email, url) dct:publisher
Responsible organisation 19115 role = pointOfContact contactPoint (probably Organisation, with at least name, email, url) dcat:contactPoint
Use constraints Use constraints is being used to indicate a license license dct:license
Where GEMINI has an Anchor URL to the licence licence.CreativeWork
.abstract (with the free text) and .url (with the Anchor target URL)
Use constraints Other circumstances dct:accessRights
Bounding box spatialCoverage.geo.GeoShape.box dct:spatial Note: needs translating from four edges to two corners
Resource identifier identifier dct:identifier
Resource type rdf:type Note: DCAT-AP does not distinguish between datasets and dataset series
@PeterParslow PeterParslow added enhancement New feature or request Elements Issue that primarily affects the GEMINI elements DD3 Recommendation from the Geospatial Commission Data Discoverability 3 project labels Apr 7, 2021
@PeterParslow
Copy link
Contributor Author

The W3C mapping, on which this is largely based, is at https://www.w3.org/2015/spatial/wiki/ISO_19115_-_DCAT_-_Schema.org_mapping

@PeterParslow
Copy link
Contributor Author

Andrea Perego’s ISO 19139 - DCAT mapping in GitHub (James’ link) provides more detail e.g. the range of each element, and also maps somethings outside the DCAT namespace(s).

https://github.com/GeoCat/iso-19139-to-dcat-ap/blob/master/documentation/Mappings.md

(Thanks to James Reid)

@PeterParslow
Copy link
Contributor Author

PeterParslow commented Aug 30, 2023

Just been contacted by the CDDO data standards team looking to state how to describe "where" in DCAT metadata to be used in the UK government data marketplace. This will include updating the mapping above for DCAT v3.

See co-cddo/ukgov-metadata-exchange-model#1

@nmtoken
Copy link
Contributor

nmtoken commented Aug 30, 2023

Should we also adapt the GEMINI mapping to DCAT 3 as this now includes better description of dataset series?

@PeterParslow
Copy link
Contributor Author

That will be a necessary part of the CDDO work; I'll make sure it is available as an update to this GEMINI change request. It's also being discussed (& likely to happen) in the OGC GeoDCAT SWG.

@PeterParslow
Copy link
Contributor Author

Need to annotate this to show how it aligns (or not!) with the UK Cross-Government Metadata Exchange Model which may be re-branded as a UK Application Profile of DCAT

@archaeogeek
Copy link
Member

archaeogeek commented May 1, 2024

@PeterParslow to update table, then @archaeogeek to update elements with equivalent mappings, also publish this table as guidance

@PeterParslow
Copy link
Contributor Author

We'll also need to include guidance or at least comment on converting GEMINI to DCAT covering how many dcat distributions to create (depending on e.g. GEMINI Use constraints & Resource locators).

Revised table, with extra columns for DCAT v3 & UK government metadata exchange model. Note, the UK Gov work is supposed to consider adding spatial & some other things; they also plan to convert it to a full AP of DCAT v3.

GEMINI element Condition Schema.org DCAT/DCAT2[1] Notes DCAT3 UK Gov MXM
Title name dct:title Y Y
Dataset language inLanguage dct:language Y N
Abstract description dct:description Y Y
Topic category keywords dct:subject Y N
Keyword INSPIRE theme keywords dcat:theme / dct:subject DCAT3 expects theme to be used when the target is a SKOS concept; subject in the more general case, whether or not the term is from a controlled vocab Y dcat:theme
Keyword free text keywords dcat:keyword Schema.org puts all the ‘free text’ keywords in one value; DCAT / MXM keyword are 'uncontrolled' literals Y Y
Keyword Controlled list, URL Keywords.DefinedTerm.name
Use .description for the textual content of the Anchor or CodeList
Use .url for the target of the Anchor
dcat:keyword.DefinedTerm dcat:theme? dcat:theme
Temporal extent temporalCoverage[2] dct:temporal Y N proposed
Dataset reference date 19115 dateType = publication datePublished dct:issued release date / issued Y Y
Dataset reference date 19115 dateType = revised dateModified update date / dct:modified Y Y
Lineage dct:provenance Uses PROV N
Extent spatialCoverage.Place.name dct:spatial if available as a link Y N proposed
Resource locator.linkage 19115 function = download contentURL (inside “distribution”) dcat:downloadURL Y Y
Resource locator.linkage 19115 function = “information”
Where the page links on to download
dcat:accessURL of a dcat:Distribution? Y N
Resource locator.linkage 19115 function = “information” url dcat:landingPage Y N
Data format encodingFormat dct:format of a dcat:Distribution Possibly also dcat:mediaType Y N
Responsible organisation 19115 role = publisher publisher.Organization (with at least name, email, url) dct:publisher Y Y
Responsible organisation 19115 role = pointOfContact contactPoint (probably Organisation, with at least name, email, url) dcat:contactPoint dcat:contactPoint is a vCard Y must contain email & contactName (organisation)
Use constraints Use constraints is being used to indicate a licence license dct:license license is a property of a distribution Y Y licence
Use constraints Where GEMINI has an Anchor URL to the licence licence.CreativeWork
.abstract (with the free text) and .url (with the Anchor target URL)
Y Y
Use constraints Other circumstances dct:accessRights accessRights is a property of the dataset Y Y
Bounding box spatialCoverage.geo.GeoShape.box dct:spatial.dct:Location.dct:bbox Note: needs translating from four edges to two corners Y N
Resource identifier identifier dct:identifier Y Y
Resource type rdf:type cataloguedResource is either Dataset or DataService; Note: DCAT-AP does not distinguish between datasets and dataset series; DCATv3 does The CataloguedResource can be either Dataset, DatasetSeries, or DataService Y

@archaeogeek
Copy link
Member

@PeterParslow what do I need to do next? I can't remember...

@PeterParslow
Copy link
Contributor Author

@PeterParslow what do I need to do next? I can't remember...

See if what I've come up with in a desk exercise matches what you'd expect from the GeoNetwork implementation of DCAT?

@nmtoken
Copy link
Contributor

nmtoken commented Jun 18, 2024

@archaeogeek do you have a link to where this transformation is mapped in GeoNetwork 4. It is available (in theory) though the OGC API - Records interface, though links aren't working for us

@archaeogeek
Copy link
Member

@nmtoken it's not the mapping. We have it working here: https://spatialdata.gov.scot/geonetwork/api/collections/main/items/fa510351-8e30-4147-b984-862be84a6f90. You need to check the log files- I suspect you're missing the relevant xsl files in https://github.com/geonetwork/geonetwork-microservices/tree/main/modules/services/ogc-api-records/src/main/resources/xslt/ogcapir/formats/copy (which is completely undocumented). Basically you need a gemini one that matches the iso19139 one

@nmtoken
Copy link
Contributor

nmtoken commented Jun 19, 2024

Not the headers then (geonetwork/geonetwork-microservices#114) ?

@archaeogeek
Copy link
Member

@nmtoken the above is all I had to do to get it working, YMMV.

@nmtoken
Copy link
Contributor

nmtoken commented Jul 2, 2024

@archaeogeek Just checking we are not talking at cross purposes, you seem to be saying that in your Tree Preservation Orders - Argyll and Bute example the fact that the schema.org, dcat, dcat_turtle, and geojson tabs link to content is becuase you have a gemini XSL file and we don't.

For us (for example https://metadata.bgs.ac.uk/geonetwork/api/collections/main/items/a2b1143b-5c5d-23d6-e054-002128a47908) and the EEA geospatial data catalogue (for example https://sdi.eea.europa.eu/catalogue/api/collections/main/items/71c47f78-27b6-4080-acd5-47b306b273d8) these tabs don't give any content (only errors).

@PeterParslow PeterParslow removed their assignment Jul 8, 2024
@archaeogeek
Copy link
Member

archaeogeek commented Jul 12, 2024

@nmtoken to be precise, what I'm saying is that the only change we ever needed to make to get a full set of working links in the ogc-api service is to add the gemini xsl record, eg adding a iso19139.gemini23 equivalent of the files here: https://github.com/geonetwork/geonetwork-microservices/tree/main/modules/services/ogc-api-records/src/main/resources/xslt/ogcapir/formats/copy, which is identical to the ones already included. This does trigger an error in the ogc-api service logs if you dig deep enough.

@PeterParslow
Copy link
Contributor Author

I no longer know what I meant by dcat:keyword.DefinedTerm! It's not something that seems to exist in DCAT (v2 or v3). Conveniently, that makes it clearer (to me) that dcat:keyword if for uncontrolled ones and dcat:theme is for controlled lists (on the assumption they are published as SKOS)

@archaeogeek
Copy link
Member

Finally coming back to this- I have dug into the iso19139.gemini23 code and extracted the mapping of Gemini to schema.org (from https://github.com/AstunTechnology/iso19139.gemini23/blob/4.2.x/src/main/plugin/iso19139.gemini23/formatter/jsonld/iso19139.gemini23-to-jsonld.xsl). I've created this as a google sheet for the moment, which anyone can access and comment on. https://docs.google.com/spreadsheets/d/1uHiH-huQ9VuNeAJ7vRNUlpYtMGZrhGZkVLVS4GaGYs0/edit?usp=sharing. I think this does match the table above (#41 (comment)) mostly!

I see two actions:

  1. decide if we're happy with that implementation (and if not, submit issues to iso19139.gemini23 to ask for changes)
  2. decide how we want to present this information on the Gemini website

@archaeogeek
Copy link
Member

Noting also that there's #42 which refers to a similar thing- just making sure we don't forget it!

@PeterParslow
Copy link
Contributor Author

Thoughts on https://docs.google.com/spreadsheets/d/1uHiH-huQ9VuNeAJ7vRNUlpYtMGZrhGZkVLVS4GaGYs0/edit?usp=sharing:

  • the mapping above concentrates on DCAT; the one Jo shared concentrates on Schema.org
  • I then started a 'line by line' comment, but realised there's some nested structure of Schema.org use going on here which isn't clear in the mapping spreadsheet. I think I've worked that out correctly (the rows below each row with a value in column B are sub-attributes of that row?)
  • none of the GEMINI Dataset reference dates are mapped
  • GEMINI Topic category is not mapped (but is not that useful IMHO!)
  • GEMINI Extent is not mapped
  • I found (above) a "cleverer" way to map keywords that are from controlled lists to Schema.org
  • all GEMINI _Resource locator_s are mapped to "DataDownload"
  • all GEMINI _Responsible organisation_s (gmd:pointOfContact) are mapped to "maintainers" whilst the GEMINI Metadata point of contact are seen as "publishers"
  • the "license" mapping is different
  • the table above misses any mapping of GEMINI Limitations on public access or anything to Schema.org conditionsOfAccess, which make me wonder whether the table above mapping some GEMINI Use constraints to dcat:accessRights is wrong.

Could someone check my observations? Then we need to discuss what to do about them? e.g. should we (GEMINI WG) concentrate on mapping to DCAT or to Schema.org? Which differences do we see as issues to be fixed in GeoNetwork & where is the GeoNetwork mapping better than the one above?

@PeterParslow
Copy link
Contributor Author

Many of my comments above do not apply to the (more recent) XSLTs now in GeoNetwork core: https://github.com/SPW-DIG/schemas/iso19115-3.2018/src/main/plugin/iso19115-3.2018/formatter/dcat

  • both dates are mapped
  • Extent is mapped
  • Resource locators seem to be better mapped
  • role gets mapped
  • they seem to map all access & use constraints to DCAT distribution rights. They quote a statement which I'd missed (because it lies in the text of DCAT Dataset) as to why this is preferred (over quoting them for the dataset - but that remains an option in DCAT that may be useful if the terms are the same across all distributions)

I haven't yet found where/whether Topic category is mapped.

I'll see if the authors of those XSLTs have a 'conceptual mapping' spreadsheet/spec that they were working from.

@nmtoken
Copy link
Contributor

nmtoken commented Nov 18, 2024

https://github.com/SPW-DIG/schemas/iso19115-3.2018/src/main/plugin/iso19115-3.2018/formatter/dcat doesn't exist

@PeterParslow
Copy link
Contributor Author

Odd; I'm sure I was reviewing it yesterday!

Anyway, meanwhile, here is a spreadsheet mapping that was used to create the XSLTs & might be more useful for us to review: https://docs.google.com/spreadsheets/d/1pJkKgGa655Dv06_UFwzeYoje9hEwzifE/edit?usp=sharing&ouid=115738132131001964515&rtpof=true&sd=true (created by GeoCat staff working on GeoNetwork; shared just now in the OGC Metadata Code Sprint).

@nmtoken
Copy link
Contributor

nmtoken commented Nov 22, 2024

Useful table; are they also looking at any additional mappings related to dcat3 or dcat-AP 3 and other profiles?

@PeterParslow
Copy link
Contributor Author

Useful table; are they also looking at any additional mappings related to dcat3 or dcat-AP 3 and other profiles?

I guess that depends who you meant by "they".

  1. I don't suppose the GeoNetwork team are looking at any additional mappings. One of them did say he'd like an authoritative TC 211 mapping to stop his customers each doing their own mapping.
  2. Assuming the ISO/TC 211 project gets the go ahead, then I guess it/we will be open to look at other mappings for a while. The intention will be to document good practice in mapping from ISO 19115 to DCAT3 (preferable OGC GeoDCAT - but that may need our input to make it happen).

@PeterParslow
Copy link
Contributor Author

Looking at this from our OS GeoNetwork instance (now we have 4.2), I'm surprised by the mapping given to GEMINI "Conformity" (ISO DQ_DataQuality > DQ_Element.result > DQ_ConformanceResult) - it comes out in GeoNetwork API DCAT as prov:wasUsedBy/prov:Activity/prove:qualifiedAssociation/prove:hadPlan/prov:wasDerivedFrom with the 'result' ("gmd:pass") as prov:generated with the correct value from the INSPIRE register*

I guess that makes sense, saying that the dataset was used in an activity involving the INSPIRE spec, resulting in "not conformant".

But I did expect it to map to dcat's dcterms:conformsTo as described at https://www.w3.org/TR/vocab-dcat-3/#quality-conformance-statement - but now I see that GN's choice kind of follows Example 47 in DCAT, in order to allow for "not compliant". Maybe that's why I didn't include that mapping in the tables above!

*clever: we should probably reference these in the recommended GEMINI encoding: https://inspire.ec.europa.eu/metadata-codelist/DegreeOfConformity/notConformant

@PeterParslow
Copy link
Contributor Author

Further reading on (my) GeoNetwork instance:
In spite of GeoNetwork 4.2 saying that it doesn't have a DCAT API, elsewhere it does admit to having one returning partial/incomplete DCAT (for example, not populating dct:license, comment "TO DO"). GeoNetwork 4.4 returns a very different looking DCAT, using a different XSLT.

XSLTs in 4.2.x branch: https://github.com/geonetwork/core-geonetwork/blob/f4890d56b0a61427ea5c9731cc8315ba7e247125/schemas/iso19139/src/main/plugin/iso19139/convert/dcat.xsl, and/or https://github.com/geonetwork/core-geonetwork/blob/f4890d56b0a61427ea5c9731cc8315ba7e247125/web/src/main/webapp/xslt/services/dcat/rdf.xsl

GN 4.4 says it uses iso-19139-to-dcat-ap/iso-19139-to-dcat-ap.xsl at main · SEMICeu/iso-19139-to-dcat-ap

They both have the same prov:wasUsedBy approach to Conformity, for example. The various occurences of dct:conformsTo in the 4.4 output are about the metadata record (conforms to http://data.europa.eu/930/ and source conforms to <dct:Standard rdf:about="http://vocab.nerc.ac.uk/collection/M25/current/GEMINI/"/> (from input record)

The 4.4 one does populate dct:license, dct:rights, dct:accessRights, all from the input record

Why/how does this matter to GEMINI (as opposed to GeoNetwork users!)?
We will need to (be expected to) select a DCAT mapping & (perhaps) provide some encoding information as well...

In particular, I am working with CDDO to get DCAT harvested (from a GEMINI-populated GeoNetwork) into the Data Marketplace - this GEMINI issue therefore become more urgent (at least for me & other government publishers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DD3 Recommendation from the Geospatial Commission Data Discoverability 3 project Elements Issue that primarily affects the GEMINI elements enhancement New feature or request
Projects
Status: In progress
Development

No branches or pull requests

3 participants