Questions and Answers

Question: Archival Resource Key (ARK) or Document Object Identifier (DOI)?

Answer: Ark. It has suffix passthrough and less constrained metadata requirements

Question: Where are we going to get these super awesome `ark:` ids?

Answer: We could mint them ourselves, just like money. Oh, no, wait...not just like money. It's not too hard to set up a local "minter" and expose it as a service endpoint. There's a second part, the name mapping service, that is needed to turn an ark: id into an actual object (more specifically the URL of an actual object.) And then it's also a good idea to provide a service that returns a little meta data about the resource and the organization that's hosting it.

These last two services could easily be internal K-grid library responsibilities, and if we were only doing standalone libraries we'd probably just build those services (and the minter) into the library. But once you start to distribute the objects across libraries it's trickier. An object may live in more than one library and each library may have a different collection. The ark: ids created by different libraries need to be unique. The meta data about the objects and the organization (library) can vary independently. Users of the library can reasonably expect a library to help them find objects in other libraries.

Question: Oh, no!

Answer: Ummm...not a question.

What's needed is some central resource (or known set of resources) whose responsibility is to to provide and keep track of unique identifiers for objects, and to answer questions about objects and where they are located. We could build a central reference service for this, but the EZID service from the California Digital Library system (CDL) fits the bill nicely. The online service at ezid.cdlib.org mints ids, resolves them to URLs, and provides a place to keep simple metadata.

One less moving part for us to maintain, and already set up to serve many, many clients like us in the world.

Question: Whew..that's a relief. Can we use the `ark:` identifier as internal identifier in the Fedora repo?

Answer: I'm not sure I see the advantage. The K-grid is not primarily about preservation. It adopts the ARK philosophy that: bq. ...persistence is purely a matter of service and not a property of a naming syntax. Moreover, that a "persistent identifier" cannot be born persistent, but an identifier from any scheme may only be proved persistent over time.

We will provide mapping between the ark: identifiers and one or more repositories by acting as a Name Mapping Authority (NMA). K-grid will handle delegation of id resolution, proxying, import/export, etc., in order to fulfill the the service requirements around knowledge objects across distributed repositories. The internal identifiers need not (will not) necessarily be the same for the same object in two different K-grid repositories.

Question: Does this mean every K-grid repository has to be a NMA?

Answer: In a word, yes. The API exposed by the standard front end (Object Teller) will accept and return references to KOs using the external (ark:) id. From the point of view of users of the front-end Object Teller application the ark: identifiers don't play a central role; they are simply the "handles" that go along with various lifecycle and discovery use cases. For developers of clients and services which create and consume the KOs in the library they are pretty central.

Question: Wait a minute...what are these other "clients" and "services" of which you speak? I thought we were building a "thing".

Answer: First of all, if we build it, they will come. There are many targeted clinical, research, and administrative use cases that pretty much require tailored clients: researcher vs. clinician, desktop vs. mobile, patient vs. provider, hospital quality control vs. bedside care, cancer protocol vs. wellness promotion. All these "consumers" will be using our knowledge objects, and will need to no for sure, which objects are appropriate, where they came from, and how and when they change.

But I was thinking more of our central library services....

The long range plan is to separate out the different services currently bundled in the Object Teller front-end, KO lifecycle management, user management, KO discovery services, library management and configuration, using objects (i.e. execution/computation), etc.

The name mapping (from the previous question) is an infrastructure service supporting lifecycle management and discovery, and will be one more service provided by the platform.

Question: So what's your point?

Answer: Settle down. The point is...

Every service should accept and use the persistent ark: id, at a minimum. Some services, for testing, particular back-end client interactions, infrastructure services like backups, etc. might expose internal ids for use. Furthermore, nothing prevents a service from using the persistent ark: id for its internal identifier. One could imagine a caching or resolution service using a simple key-value store with the supplied ark: ids as the primary key for quick look up.

Question: Wait a minute...if every installation of the K-grid/Object Teller platform has to be a name mapping doohickey, and `ark:` ids cover the whole wide world of KOs, but different repositories have different KOs in them, aren't there going to a lot of "Object Not Found" errors?

Answer: You sure do worry a lot....

One of the nice things about a name mapping (resolver) strategy is that it can include delegating to a higher authority. As mentioned, we are planning to use an external "minter", the EZID service from the California Digital Library system (CDL). When they mint a shiny new ark: id (say, ark:/456543/2123abc45, they also set aside a little space for metadata, and act as a NMA, assuming when you've set up a "target" in their system. So if one of our K-grid repos (acting as an NMA) fails to resolve a request for {http://kgrid.med.umich.edu/456543/2123abc45, it can pass it along to http://n2t.net/ark:/456543/2123abc45 (the CDLs Name-to-Thing service) to be resolved.

If we add the K-grid URLs as the target for our ark: ids in the CDL service then n2t.net will redirect to something like http://kgrid4.baylor.edu/456543/2123abc45, and Bob'll be your uncle!

We might handle the the upstream checking for the client in the local K-grid instance, or we might just redirect the client to n2t.net and wish them luck! If we've been careful, they should end up at another friendly library. We might even try to fetch the KO from the other library and add it to our own collection (assuming we have permission to do so.)

Question: That sounds complicated and like a lot of work.

Answer: Again, not a question.

It is complicated. A lot can go wrong. We'll start out just making sure that ark: id assignment and name resolution works for local libraries, then we'll add the ability to at least check with n2t.net and maybe a sister library, and let the client no what we found.

Question: Can users add their own identifiers, like a DOI, if there is one?

Answer: Sure! We will probably add something akin to the "keywords" attribute to capture the idea of "other identifiers" or "aliases".

The more important question is whether these will be treated as additional unique keys or just as additional metadata. In the first case we would probably implement some kind of auxiliary name mapping to return stable ark:-based references based on the additional identifiers.

In the second case, while we would make KOs "discoverable" based on the additional aliases, they would not be "referred to" by those ids, nor would they be "usable" (i.e. able to participate in key use cases) under those ids. Most importantly, our services would make no assumptions about the uniqueness or persistence of the additional identifiers, any more than we would about the name or contact email. Two KOs with the same ark: id are assumed to be the same thing in a fundamental way; two KOs with the same foo: attribute are not.

Question: What if I create a KO by accident? Will I have "wasted" an identifier?

Answer: Yep. Don't worry though, there are a lot of them! And if the KO is deleted before it is published the ark: id can be recycled.

Question: Why do I need a fancy persistent identifier if the KO isn't "published" or being used?

Answer: You probably don't but we want you to have one anyway. It helps with the "distributed" nature of the library. That way we are all ready when you do publish the KO.

One approach (taken by ezid.cdlib.org) is to allow ark: ids to be "reserved". A reserved id can eventually be made public, then...well, here's the rules they use:

A status of reserved may be specified only at identifier creation time. A reserved identifier may be made public. At this time the identifier will be registered with resolvers and other external services. A public identifier may be marked as unavailable. At this time the identifier will be removed from any external services. An unavailable identifier may be returned to public status. At this time the identifier will be re-registered with resolvers and other external services.

The idea of "registered with/removed from resolvers and other external services" has to do with the name mapping authority (NMA).

This workflow maps pretty well to how we've been thinking of the Object Teller and the KO model. KOs are created as "private". They can be edited, shared explicitly, possibly used under controlled circumstances. At some point a KO can be published (become "public"). At that point, a larger set of users (perhaps the world) can view/access the object. We probably need a slightly more robust set of states, covering limits to execution (on our stacks) vs. limits to access, vs. limits to editing (change control). But the CDL specified that, in the interest of "persistence as a service guarantee", once a thing is public it cannot be deleted, only made unavailable.

Question: Versioning?

Answer: Brevity is the soul of wit—I'll assume you mean, "What is the relationship between the publishing lifecycle and versioning of a KO, especially once it is in use in the wild."

Good question....

Tools
- Dashboard
- Pivotol Tracker
- Slack
- Circle-CI
- Docker
- NPM
- OSS Repos
Instances
Pipeline

Questions and Answers

Question: Archival Resource Key (ARK) or Document Object Identifier (DOI)?

Question: Where are we going to get these super awesome ark: ids?

Question: Oh, no!

Question: Whew..that's a relief. Can we use the ark: identifier as internal identifier in the Fedora repo?

Question: Does this mean every K-grid repository has to be a NMA?

Question: Wait a minute...what are these other "clients" and "services" of which you speak? I thought we were building a "thing".

Question: So what's your point?

Question: Wait a minute...if every installation of the K-grid/Object Teller platform has to be a name mapping doohickey, and ark: ids cover the whole wide world of KOs, but different repositories have different KOs in them, aren't there going to a lot of "Object Not Found" errors?

Question: That sounds complicated and like a lot of work.

Question: Can users add their own identifiers, like a DOI, if there is one?

Question: What if I create a KO by accident? Will I have "wasted" an identifier?

Question: Why do I need a fancy persistent identifier if the KO isn't "published" or being used?

Question: Versioning?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Question: Where are we going to get these super awesome `ark:` ids?

Question: Whew..that's a relief. Can we use the `ark:` identifier as internal identifier in the Fedora repo?

Question: Wait a minute...if every installation of the K-grid/Object Teller platform has to be a name mapping doohickey, and `ark:` ids cover the whole wide world of KOs, but different repositories have different KOs in them, aren't there going to a lot of "Object Not Found" errors?