Metadata Element for Long-term Preservation #149
Replies: 2 comments 1 reply
-
|
@kpletsch Thank you for this suggestion, and I'm sorry for the delay responding! Based on your description, I understand it's important to be able to link to the institution or repository that archives the data, although there won't necessarily be a direct link to the archived copy of the data itself (e.g., in a dark archive). Therefore, I would suggest using Contributor with contributorType "HostingInstitution" for this use case. This fits with the definition, which has a note that it can be used for an organization storing data offline:
Would this potentially work for your use case? |
Beta Was this translation helpful? Give feedback.
-
|
1+ |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What is the problem that your suggestion solves?
A lot of research data is currently not preserved (dpc bitlist, [http://doi.org/10.7207/dpcbitlist-23, see published and semi-published research data]) even if the data is published, citable and part of research communication and, accordingly, should be accessible and reusable in the long-term. There is no central database where funders, researchers or institutions can query which research data publications have been archived and preserved by which responsible party. Still, funders increasingly ask for publication of research output in “trusted data repositories” indicating the importance of long-term accessibility and reusability, beyond life time of personnel, projects, software, format standards and hardware. On the other hand, Crossref, Keepers Registry and library catalogues allow such documentation but these three solutions don’t cover research data and, as such, don’t provide solutions in this case.
Such a metadata element would
Information about the validity of certification ensures that the respective institution actually still exists and still preserves and curates data they are responsible for in a responsible manner. It ensures that preservation solutions are actually trusted repositories as recommended by funders.
What solution might meet your needs?
Addition of an optional metadata element isArchivedBy for research data publications, i. e. connected to a research data publication DOI. The value should be a machine readable identifier of a data repository, a data provider, a service provider or institution which has an active digital long-term preservation certification (CoreTrustSeal, nestor seal or ISO 16363) and therefore conforms to the long-term preservation values specified by ISO 14721.
Specifically the value should be/have:
An example for such a registry is re3data - the re3data record already contains information about certification of the institution and it also has an API [https://www.re3data.org/api/doc]. An API request may use the re3data ID of e. g. World Data Center for Solid Earth Physics (r3d100010447) in a request URL [https://www.re3data.org/api/v1/repository/r3d100010447] and the response contains details about the service, including its certification.
The specific metadata element can have a slightly different label, e. g. isArchivedAt, isPreservedBy or similar. Schema.org supports the following markup properties: archivedAt [[https://schema.org/archivedAt],(https://schema.org/archivedAt)], conditionsOfAccess [https://schema.org/conditionsOfAccess], holdingArchive [https://schema.org/holdingArchive]. Of these only archivedAt and conditionsOfAccess are part of the dataset markup of schema.org [https://schema.org/Dataset]. These properties have different intentions compared to the one proposed for DataCite, i. e. allowing documentation of links to archived versions or documentation of access methods, both in unstructured free text. They don’t allow querying responsibility in a structured manner and don’t cover dark archives. Dark archives don’t allow access to archived versions and work in concert with access platforms, meaning they don’t provide an URL to the archived version.
Additionally – maybe as a future implementation – the addition of the type of certification of the institution as a sub-element in DataCite would be beneficial. And finally, the addition of another sub-element covering the validity of the certification would be valuable – since re3data does not contain this information at the moment, this may only be possible in the future.
Your name
Katja Pletsch
Your organization
ZB MED – Information Centre for Life Sciences
What alternatives have you tried or considered?
There is currently no workaround. In case a German regional library catalogue supports research data cataloguing (few officially and structurally do) and in case an institution catalogues their research data and in case the institution adds preservation information to the library catalogue some research data preservation status may be queried via these. This is insufficient due to implementation level and limited reach/coverage since research is international.
Another workaround would be looking up each institution with a certification on re3data and on coretrustseal.org individually, analysing their web presence for APIs like OAI-PMH and analysing their content for research data. As this involves several steps, might be dependent on institutional infrastructure and does not allow structured querying across institutions this workaround does not serve as a realistic alternative.
Is there anything else you would like to share?
No response
What group(s) would benefit from your suggestion?
If other group(s), please describe.
Researchers, Funders
Beta Was this translation helpful? Give feedback.
All reactions