relationType to link Dataset processed/transformed by Software #198
Replies: 6 comments 11 replies
-
|
I could use this kind of relationType whenever I add a link to software that was used to process, transform, or analyze the data, and conversely, in Zenodo or our institutional repository, when authors want to link from code to a corresponding dataset. This would seem to be aligned with reproducibility goals too. Is a more granular representation needed to differentiate between data analysis and computational pipelines that process or transform data? |
Beta Was this translation helpful? Give feedback.
-
|
Speaking from my role in the NASA Heliophysics science data repositories, I also support this addition as a critically needed option to connect data and software. To echo the relationship in Schema.org that is used to connect data and software (see the last paragraph of this section), I suggest the relation type be called "IsGeneratedBy" / "Generated" with the change in tense (from 'was' to 'is') to match existing relation types in DataCite. This would result in connections between data and software, notebooks, and similar resource types being possible. We are beginning to be blocked by the lack of such a relation type when indicating that a dataset was produced by a certain version of modeling, analysis, or mission software, which is critical for the transparency of our research publications and for a proper impact analysis of the data and software funded by NASA Heliophysics. Here are two simple examples representative of the most common ways we intend to use this. Example 1: A software is used to generate a dataset without any input data, such as a modeling software or the original production of a dataset. Example 2: A software is used to generate a dataset with at least one input dataset, such as in the combined analysis of multiple datasets to produce a new dataset. Let's call the input datasets 'dataset inA' and 'dataset inB', the output 'dataset outA', and the software 'software C'. Then, the provenance between the four resources would be represented with: @dgarijo What is the conversation in CodeMeta / software for representing these provenance relationships in DataCite? |
Beta Was this translation helpful? Give feedback.
-
|
@rebeccaringuette 👍 I would love it if the DataCite extensions for these provenance relationships followed the well-established W3C PROV-O vocabulary, as described in the SOSO provenance section that you linked. I wrote that section and am happy to chat with anyone thinking of implementing this. We had many community conversations in SOSO and ESIP around it before it made it into the recommendation. Plus its just nice to not start over when others (e.g., the W3C PROV working group) established such a clear solution 12 years ago (see PROV-O). |
Beta Was this translation helpful? Give feedback.
-
|
+1 to use prov for these cases. It is the intended use
El sáb., 8 nov. 2025 8:48 a. m., Matt Jones ***@***.***>
escribió:
… @rebeccaringuette <https://github.com/rebeccaringuette> 👍 I would love
it if the DataCite extensions for these provenance relationships followed
the well-established W3C PROV-O vocabulary, as described in the SOSO
provenance section
<https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#provenance-relationships>
that you linked. I wrote that section and am happy to chat with anyone
thinking of implementing this. We had many community conversations in SOSO
and ESIP around it before it made it into the recommendation. Plus its just
nice to not start over when others (e.g., the W3C PROV working group)
established such a clear solution 12 years ago (see PROV-O
<https://www.w3.org/TR/prov-o/>).
—
Reply to this email directly, view it on GitHub
<#198 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALTIGX34ICKZBOCEKAEB7L33UVWDAVCNFSM6AAAAACH2SEW2OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOJQG44DAOI>
.
You are receiving this because you were mentioned.Message ID:
<datacite/datacite-suggestions/repo-discussions/198/comments/14907809@
github.com>
|
Beta Was this translation helpful? Give feedback.
-
|
@KellyStathis Is there any way to get these two relation types into the next version (4.7) of DataCite? The drafts for the needed pieces are below: "Generated" |
Beta Was this translation helpful? Give feedback.
-
|
Looking at this more closely, the right prov relationship to map to is
prov:wasAttributedTo. Was generated by links an output to an activity
(like the execution process of a tool), while attribution links the output
to the agent (or tool) that in this case generated it.
If what you aim to represent is something like "this tool generates this
type of dataset" then you may want to have a look at p-plan:
http://purl.org/net/p-plan#isOutputVarOf
Was generated by is quite broad in scope.
El sáb., 15 nov. 2025 3:33 a. m., Kelly Stathis ***@***.***>
escribió:
… Thanks @rebeccaringuette <https://github.com/rebeccaringuette> and
@mbjones <https://github.com/mbjones>! We discussed IsOutputOf further
this week, and the consensus is that it's less specific than
wasGeneratedBy. Therefore, we need some more time to review the use cases
for software relationships and explore how we might incorporate the PROV
terms into the DataCite relationType vocabulary. I'll keep you updated as
we move forward!
—
Reply to this email directly, view it on GitHub
<#198 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALTIGWFFTEKPNLOZ46SAWD34YN7FAVCNFSM6AAAAACH2SEW2OVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOJXGI2DAMQ>
.
You are receiving this because you were mentioned.Message ID:
<datacite/datacite-suggestions/repo-discussions/198/comments/14972402@
github.com>
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What is the problem that your suggestion solves?
From an email exchange:
What solution might meet your needs?
A new relationType pair, potentially one of:
Your name
Kelly Stathis
Your organization
DataCite
What alternatives have you tried or considered?
The relationType IsDerivedFrom would link original data and transformed data, but doesn't work to link to the software.
Compiles is closer, but may have a more specific meaning in software (compilation).
The scenario is similar to that between and Instrument and a Dataset, where we have the relationType pair IsCollectedBy/Collects—but this doesn't imply any processing, just collection of (raw?) data.
Is there anything else you would like to share?
Front conversation (link for DataCite staff)
What group(s) would benefit from your suggestion?
If other group(s), please describe.
No response
Beta Was this translation helpful? Give feedback.
All reactions