-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define schema for project update posts #82
Comments
Hey team! Please add your planning poker estimate with Zenhub @aaronc @blushi |
I discussed this briefly with @blushi today. Here are my thoughts:
|
cc/ @paul121 |
+1 to WKT and fewer linked-lists 👍 The choice for the standard/schema is interesting. Generally I've been thinking it would make sense to use schema.org for these project updates but I really haven't given it much thought up until now. I'm not very familiar with Dublin Core and just doing some research now, but realizing I have seen the DC prefix used in various places ( Some of my thoughts:
I'm starting to wonder... are project updates meant to be "web content" in their native form? Or are they really meant to be (semi) scientific observations, claims, datasets, etc? I may need a refresher on the scope/requirements for the Registry Web App. But in a general sense I think I'm leaning towards structuring or conceptualizing these as more "scientific" in their native form, and thus DC and DWC are interesting, but I would like to learn more/see more examples. I also may be associating schema.org too closely with only "web content" use-cases. |
Chatting today:
|
Examples in JSON-LD Playground:
Both examples should be roughly equivalent. In general I tried to model as follows:
Some initial thoughts:
|
I think the post would generally be the top level element, and then the file would be some collection that is associated with it. The access rights I believe would be stored outside of the post in the database so we probably don't want to include that here. Likely ditto for the author. I think it would be helpful to narrow this down to the existing JSON elements that we already have. @blushi do you have a sample JSON blob of what a post would look like (without any special RDF schema) given what we have already defined? |
Re: post as top level, yes I agree. I think I was getting a little hung up on how to use collections. The collection could be a simple sub-element on the post that then references files. But unless we have additional properties to assign to the collection (like a location or access rights), it might just be easier to reference files directly from the post. Re: author, I see why this wouldn't need to be included, especially if only used for access control. I'm just holding some thought to how this same post schema could be used elsewhere (we would like to reuse for SeaTrees) where the author could be a more useful property. But easy enough for others to add an More generally re: access rights, I agree this should be stored outside the post. Although this makes me wonder how parts of the access logic will be implemented and how it impacts the schema design. Specifically how we ensure private data is not returned via API. Has this been decided?:
It seems there could be some elegance in creating separate documents and maintaining a single, relatively simple implementation for access logic where each IRI has its access logic/owner/etc stored in the database. This could be reused for future use-cases of anchored data too and seems to be inline with the larger vision of a use-case for data revolvers to implement access control. But it could also make the schema a little more complex eg: requiring two documents for a public file with a private location. A simplified structure could be:
|
Yes see current implementation of that: https://github.com/regen-network/regen-server/blob/4f12a5b25b1593ffb5dadd36b2005ad76428d0eb/server/routes/posts.ts#L315 Author and privacy settings are indeed currently stored as separate database columns, see https://github.com/regen-network/regen-server/blob/4f12a5b25b1593ffb5dadd36b2005ad76428d0eb/migrations/committed/000047.sql
Yes this is what I was thinking about. We don't need to store a location for a post itself, only for the individual files.
I had something like this in mind for the post json
|
Here is a simple JSON. Includes a file for each type that is listed in the figma design: "Supported file types include text, spreadsheets, images and video files" {
"title": "Post Title",
"comment": "Short comment about the post",
"files": [
{
"iri": "regen:1111.png",
"name": "herding.png",
"description": "Image description",
"location": "POINT(1 2)",
"credit": "Photographer name"
},
{
"iri": "regen:2222.mp4",
"name": "herding.mp4",
"description": "Video description",
"location": "POINT(1 2)",
"credit": "Photographer name"
},
{
"iri": "regen:3333.txt",
"name": "textfile.txt",
"description": "Text description"
},
{
"iri": "regen:4444.csv",
"name": "spreadsheet.csv",
"description": "Spreadsheet description"
}
]
} |
Yeah this is interesting. It could be retrieved programmatically, but storing it on the post would make future indexing with the location much easier. And only require the location to be extracted from the image once when creating the post/file. Seeing the above json, a couple ideas:
These things might not be as necessary for this initial implementation of project updates backed by regen-server, but considering this could be a standard for project updates more generally, these are small things that would go a long ways towards making project updates more standardized. |
So if we used dubin core, we could do the following mappings:
Seems like schema.org also has a pretty similar set of items. I still feel like I'm lacking a good understanding of what either of these frameworks would really get us to the point where I'm almost inclined to just define our own properties in the regen schema namespace. |
It looks like
Above I used |
So GeoSPARQL suggests that ontologies specifically import the Interestingly, they also include an annex providing alignments of GeoSPARQL to other ontologies. This includes an alignment to schema.org and dublin core. I think the TLDR is that wherever we want to include a "location" we should use a They also provide an example query to find features with a |
Should we do a vote on Dublin core vs schema.org vs neither? |
Also what will our strategy be for ordered lists? An order property or an actual RDF list? |
Here is a pass at using LinkML to model the schema for project posts w/ some explanation of the approach I took: https://gist.github.com/paul121/1d83c0d4dcdf06c3bcff44a4c42cffd7
I would vote for DC, primarily because I continue seeing it used in various places (semantic OGC standards, FAIR data), and it allows us to leverage a standard without the scope-creep and additional meaning the may come with schema.org. This project post use-case is so simple it's hard to argue that any vocabulary will "give us much" right now. But eventually when we do have Regen/ecological domain specific concepts it will likely be better to create our own terms for those specific things rather than try to make schema.org fit. Ideally DC can be a framework to help build out these domain specific concepts.
I'm curious how important the order is for semantics. Can we depend on the data resolver to return the JSON-LD document the same as was anchored or is that too fragile? I describe in the gist, it's quite elegant just referencing Regen IRIs as subjects + objects without the need for additional blank/list nodes. But we could add a simple |
Thanks @paul121 looks great!
Agreed
I believe having some |
We need to define the schema for project posts content which should include (TBC):
Privacy settings:
The entire post content can be private.
The files can be private.
The files locations can be private.
The text was updated successfully, but these errors were encountered: