Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should create multiple documents with same id throw error? #79

Closed
jessegrosjean opened this issue May 9, 2024 · 36 comments
Closed

Should create multiple documents with same id throw error? #79

jessegrosjean opened this issue May 9, 2024 · 36 comments
Assignees
Labels
API feedback Feedback or recommendations for changes to public API

Comments

@jessegrosjean
Copy link
Contributor

For example this seems to work without error:

let id = DocumentId()
let doc1 = Automerge.Document()
let doc2 = Automerge.Document()
_ = try await repo.create(doc: doc1, id: id)

I think without error it means that doc1 is lost from the repo and there's not really any way for original caller of repo create to detect that document instance has been replaced.

@heckj heckj added the API feedback Feedback or recommendations for changes to public API label May 9, 2024
@heckj heckj self-assigned this May 9, 2024
@heckj
Copy link
Collaborator

heckj commented May 9, 2024

I may be just pre-coffee, but I'm missing what the illustration here. There repo.documentIds() which returns a list of your local repo's known document IDs, and from that you can retrieve the document - repo.find(id) if it has been stored.

If you made a second doc and stored it with the same id, then you're overwriting the one in the repo, and a local reference you might have to it should no longer be considered valid for that id. That said, I'm not sure what you're suggesting should throw an error, and what the hueristics are related to that.

@jessegrosjean
Copy link
Contributor Author

jessegrosjean commented May 9, 2024

The case I'm wondering about is in MeetingNotes:

  1. Open a document
  2. Duplicate that document in Finder
  3. Open that duplicate document

Then I have two documents open with same ID, but "I think" only second one is now held by repository. Seems like a weird state to accidentally get into, but right now unless you know details of how everything works, maybe easy to get into when building document based app.

I'm wondering if repo code should protect against this state. In particular in the public api calls that result in handles[handle.id] = handle, first check that there isn't an existing handle. So force the client to first delete or use find instead of using create in those cases.

@heckj
Copy link
Collaborator

heckj commented May 9, 2024

Ahhh, now I've got it. The gist is to disallow the idea that repo.create(id,...) as replacing the existing id - that's probably good.

But I think there may be a very legit case here - I think in the MeetingNotes example, I want that same ID with that file, even replicated, because it represents a fork that could be merged. We've lost it from the repo, so if you added editing into your sequence, then (to me) it becomes clear that what should happen on a follow on 'create' is that if it's the same ID, it should be "merged" into place, not just replaced, and if that throws an error (different document bases entirely), then that should propogate up.

What do you think? Or would it make more sense to just disallow this pattern altogether (error on creating where there's a doc with the same ID already in the repo?)

@jessegrosjean
Copy link
Contributor Author

Right... that's a tricky case I hadn't thought through.

  • Each version of the forked document has same ID
  • That means if open both versions at once then have two states with same ID
  • Which state is stored in repo?

To me simplest solution is to throw error in this case. User will still be able to edit both documents. Error will tell user that that second document is not registered in repo, and not syncing, otherwise works fine. To me this seems like correct behavior for a create method.

But what if you did want to solve for this problem? Even if you use the find method to try to avoid the create error you will still be discarding one of the document states I think.

For example you would open your old forked document, and then its state would get replaced with the already open document state. Alternatively if on opening second document there was an automatic merge... then still a problem if you didn't want to do that merge, you wanted to keep the fork state.

I guess maybe the automatic merge state loss (from users perspective) could be resolved by making documents also store the set of heads they are working from? I haven't really used that API, but I think it might allow both documents to share a common Automerge.Document instance, while still working on one forks?

@heckj
Copy link
Collaborator

heckj commented May 9, 2024

There's been some focus and conversation in the Automerge community at large about tracking "forks" of documents, so I'm inquiring about this as it might play out for the JS version of automerge-repo - it seems aligned with our thinking here. A potentially interesting note - if you "clone" a document from a repo, you get the identical document (a fork) with a new ID from the repo, but there's no API concept of a way to merge those tegether in the repo - that's all in Document, if you want to try.

The path we're running down is roughly equivalent to forking, except we're doing it well outside of the reach of any consolidation in a repository - so we're replicating docs and IDs. It may be a just a poor match for the example where you have documents stashing the information, where those documents can be replicated infinitely. When you bring them back to the repository, I think it makes sense to want to:

  1. track their changes separately
  2. know they're "related" - forks of other documents if the repo has parents from the same

That runs a bit in the face of 1:doc to 1:id and only 1 reference to an ID that the current API provides. It's conceptually enforcing that there's only 1 representation (ideally combined I would suspect) that's valid (within a repository) for a single ID.

In the meantime, throwing an error on create with an ID when there's already an ID would make a good baseline that "Hey, something is awry here".

@jessegrosjean
Copy link
Contributor Author

I guess maybe the automatic merge state loss (from users perspective) could be resolved by making documents also store the set of heads they are working from? I haven't really used that API, but I think it might allow both documents to share a common Automerge.Document instance, while still working on one forks?

Actually I don't think this idea would work.

From what I can see heads can be used to read a document at point in history, but you can't use them to edit at a specific point. I think this means multiple fork within a single document is not possible. Probably best as relying on heads for proper edits in any NSDocument backed model seems messy anyway.

To me this is an error case. You can't open two documents, same ID, different state and expect them to sync without requiring some sort of user action, such as accepting a merge.

@heckj
Copy link
Collaborator

heckj commented May 9, 2024

possible PR to resolve (the strict "throw error on ID already in repo" concept): #82

@heckj
Copy link
Collaborator

heckj commented May 9, 2024

I think this does beg for a repo-based API that allows merge - it would be just a light wrapper around document.merge, but would at least be conceptually sound. A slightly bigger question to me is how to handle this in the MeetingNotes demo app - right now I'm adding into the repo when the window is displayed, primarily because Document-based app structure demands an entirely synchronous flow, and with repo, adding is async - plus the path doesn't offer any way to ask questions like "Hey, I already got this ID, you wanna merge this?"

@jessegrosjean
Copy link
Contributor Author

A slightly bigger question to me is how to handle this in the MeetingNotes demo app - right now I'm adding into the repo when the window is displayed, primarily because Document-based app structure demands an entirely synchronous flow, and with repo

I'm working on a Cocoa NSDocument based counter app to try to figure this stuff out. My current thinking is that Repo/syncing are all extra features and don't need to be tied tightly to NSDocument lifecycle. So if all the repo stuff errors out, should still be able to work with document with minimal fuss. Just somewhere have UI indication that document isn't syncing/offline. Maybe that means it's fine to register on window display?

@alexjg
Copy link

alexjg commented May 9, 2024

FWIW in the JS implementation there is no way to specify the document ID when creating a document, the ID is assigned for you. This is deliberate as we don't want people to be able to create two documents with the same ID but with no history in common.

Also, the JS DocHandle has a merge function on it which is - as you describe - just a wrapper around the underlying document merge.

Re. tracking branches, right now that's handled by the application in an ad-hoc manner - typically by storing pointers to the branches in the branched document somewhere.

@jessegrosjean
Copy link
Contributor Author

jessegrosjean commented May 10, 2024

FWIW in the JS implementation there is no way to specify the document ID when creating a document, the ID is assigned for you. This is deliberate as we don't want people to be able to create two documents with the same ID but with no history in common.

For swift implementation it's important since document ids and data might be stored externally in document files, but maybe the API should use different terminology?

Right now Repo has:

  • create()
  • create(id:)
  • create(data:id:)
  • create(doc:id:)

What if those methods were replaced with:

  • create()
  • connect(id:doc:options:)

I don't know if connect is best term... but the idea is you are trying to connect an external document with the given id in the repo. The method should implement some find or create logic. The implementation might look like untested code:

extension Repo {
    
    public struct ConnectOptions: OptionSet {
        public static var allowMerge: Self { Self(rawValue: 1 << 0) }
        public let rawValue: UInt16
        public init(rawValue: UInt16) {
            self.rawValue = rawValue
        }
    }
    
    public func connect(
        id: DocumentId,
        doc: Automerge.Document,
        options: ConnectOptions = []
    ) async throws -> DocHandle {
        do {
            let handle = try await find(id: id)
            if handle.doc !== doc {
                if options.contains(.allowMerge) {
                    try handle.doc.merge(other: doc)
                } else {
                    throw Errors.UnexpectedMsg(msg: "Connect Failed, found existing document and no merge option")
                }
            }
            return handle
        } catch {
            if let existing = handles[id] {
                throw Errors.DuplicateID(id: id)
            }
            let internalHandle = InternalDocHandle(id: creationId, isNew: true, initialValue: doc)
            handles[internalHandle.id] = internalHandle
            docHandlePublisher.send(internalHandle.snapshot())
            return try await resolveDocHandle(id: internalHandle.id)
        }
    }
    
}

@heckj
Copy link
Collaborator

heckj commented May 10, 2024

I think I get the concept you're proposing - not opposed conceptually, and connect, while an overloaded term, is pretty good for what you're after. A couple of quick questions:

  • using repo.find implies first load from storage, if available, second attempt to load from remotes. And as we've just learned, a find with an ID that a remote JS automerge-repo doesn't have acts like a "please create me a new document".

I'm at a loss as to the meaning of "Unavailable" from a remote repo, and how that should reasonably play. The two circumstances that I'm aware of:

  • If it exists and the remote repo won't share it - then you get unavailable
  • if it doesn't exist, the remote repo returns a new empty document
    I was also told that Unavailable is intended to me a temporary state (track requested documents for active sessions #53), and that when you have repo's chained, the request should flow through connected repos - but in that scenario, I'm not sure how it squares with what we've just learned about a find with a random ID returning a new document.

@jessegrosjean
Copy link
Contributor Author

I can't remember if I posted this somewhere already, but I have been thinking that maybe (to handle these various scenarios) the NSDocument model should maintain two IDs: originalId and forkId.

The originalId is immutable and used to check for manual merges. The forkId starts out equal to originalId, and in many cases never changes from it, but the forkId can be regenerated to solve for some of the problematic cases above.

In particular:

  1. User creates new automerge backed NSDocument, saves to disk.
  2. User uses Finder to duplicate that document.
  3. User attempts to open both documents at once and connect to repo using documents forkId
  4. Second document to fails to connect (if user has chosen not to allow automatic merge, or merge somehow fails) because that id is already registered in repo
  5. With two ids the problem can be resolved by generating a new forkId and then registering that with repo.

Maybe?

@alexjg
Copy link

alexjg commented May 10, 2024

If a peer receives a request message for a document it doesn't have then it should request that document from other peers and, if no peer has it, send an unavailable message back to the requesting peer. If a remote peer responds with a sync message for an empty document then that means either a) the remote found the document somewhere, but it was empty or b) there's a bug somewhere.

For swift implementation it's important since document ids and data might be stored externally in document files, but maybe the API should use different terminology?

This suprised me because Repo looks to be managing it's own storage, but I think, based on this line

User uses Finder to duplicate that document.

That you are talking about a scenario where the Repo is not managing it's own storage? Is that correct?

The reason I'm asking is because connect seems like it's effectively a mutable document ID and that seems like a bit of a footgun. So far my best answer to "I need some mutable state" has been "put that in an automerge document".

In the case of the originalId and forkId you're talking about you could take some inspiration from the way patchwork does this, you store the forked document IDs in the original document.

@heckj
Copy link
Collaborator

heckj commented May 10, 2024

Short answer to the question above:

That you are talking about a scenario where the Repo is not managing it's own storage? Is that correct?

Yes - outside of the app even running, the content was duplicated, and then that content opened from disk. In iOS/macOS using NSDocument and equivalents, the framework wants to own loading the data and owning when and how it writes the information. So in both MeetingNotes and what @jessegrosjean is doing, we're using that to provide the binary content for the Automerge document (from save()) and an ID that represents the document. Then we're adding/loading/connecting it into the local repo, with the intent to use that to hold Documents for the app (which can have multiple windows, each with separate documents), and to sync it with remote repos.

The use case in particular is the app loads two of those files from disk, they are forks - have identical IDs, so we know they can merge, and may represent different editions or such - at the current stage we only know they sourced from the same document, and we have two in memory, with a starting position that we have that external to how a repository manages anything.

@jessegrosjean
Copy link
Contributor Author

jessegrosjean commented May 10, 2024

The use case in particular is the app loads two of those files from disk, they are forks - have identical IDs

I think maybe trying to create a fork automatically because of document copy in the finder is the wrong idea. Instead any document with same id should always sync/merge with any other document found with that same id.

From users perspective:

  • If you want to sync a document with a friend you send copy of document to friend. Both copies stay in sync when you edit assuming they use same sync server.

  • If you copy document to some other location on your computer. And then only ever open the original or the copy separately they will have the same syncing behavior, assuming same sync server.

  • So default behavior is to sync documents with same id... it seems like we should just keep with this behavior in the case when the two documents are opened at the same time in same app instance too. If the user wants to create a fork of a document they will need to do that with a separate command that generates a new document id. Just copying a document in finder never creates a fork.

Maybe?

@alexjg
Copy link

alexjg commented May 13, 2024

I think maybe trying to create a fork automatically because of document copy in the finder is the wrong idea. Instead any document with same id should always sync/merge with any other document found with that same id.

This seem sensible to me.

I've been thinking more about not exposing document IDs to users. Could this be achieved by defining some kind of "document bundle" format which the Repo can import. This bundle would have the document ID stored in the same binary blob as the saved document. Then, when a user opens a file directly from the filesystem you do something like Repo.loadBundledDocument(bundle: Data)?

I've been thinking of introducing a similar bundle format for these usecases in the JS world as well, for similar reasons.

@jessegrosjean
Copy link
Contributor Author

I've been thinking more about not exposing document IDs to users.

Are you meaning anywhere in the API, or just removing from repo.create methods? I think it makes sense to remove them from create methods. Generally having read access to IDs in the API seems like it would still be useful sometimes. For example when generating URLs.

Could this be achieved by defining some kind of "document bundle" format which the Repo can import. This bundle would have the document ID stored in the same binary blob as the saved document. Then, when a user opens a file directly from the filesystem you do something like Repo.loadBundledDocument(bundle: Data)?

I think that a standardized way to export documents from a repo and then load them into another repo would be very useful.

If you do decide to create a new bundle format would it make sense for it to contain multiple Document/ID pairs?

I think the behavior needed by NSDocument could also be useful for the Javascript version of Automerge. For example say you are working with an Automerge repo in a web browser. It could be nice to have an export command that exports the local repo storage to this bundle format. You could then take this bundle to some other computer that doesn't have network and load the bundle to restore those documents. I think NSDocument would want to use the exact same export/load implementation. Eventually maybe you could also give it options to handle forking documents in a standardized way.

Given this I don't think my above connect/load implementation is quite right. In particular it depends on repo.find, which goes out looking at connected peers. That seems like it's doing more work than is really needed for loading in storage. I guess load would instead need to know some repo internals and do something like:

  1. For each id/document pair
  2. If exists in local storage, merge loaded document into that existing document
  3. Else store loaded document into local storage directly
  4. Maybe schedule some communication about these loaded document with peers (I don't really know how this works with repos), but don't wait on this communication before returning from load method.

@alexjg
Copy link

alexjg commented May 13, 2024

Are you meaning anywhere in the API, or just removing from repo.create methods?

Yup apologies for the fuzziness, this is what I meant. Basically it's a bad idea to let users decide which Document IDs are associated with which data, it's very easy to create extremely surprising merge scenarios using this feature. Exposing the document ID for read is useful as you say - although I think we should prefer URLs (in the form automerge:<base58 check encoded document ID> because that gives us somewhere to put other useful information such as the path within the document to look at.

Re. multiple document IDs. I can see what you're getting at. What we actually want here though I think is native branch support - which is coming some time this year. This will be an automerge feature which allows a document to have multiple divergent branches which you can select from. I think for now it makes more sense to have a bundle format with a single document ID in it and have forking and merging be an explicit user action at the application level as you suggest.

That aside, I think your description of what should happen when loading a bundle is sensible.

@jessegrosjean
Copy link
Contributor Author

Re. multiple document IDs. I can see what you're getting at. What we actually want here though I think is native branch support - which is coming some time this year.

Cool on native branch support! :), but actually I think I'm taking about something else.

I just mean a NSDocument might represent a workspace that stores multiple Automerge.Documents. For example maybe a single NSDocument represents a book you are working on. And each chapter is represented by a separate Automerge.Document.

If the bundle supported a map of ids to documents then a single bundle could support this case. At the same time even if bundles are one id to one document I can also make it work with a loop over multiple bundles.

It doesn't really matter, whichever you think is better I'm for. It's the load logic added to the repo that provides the real value for me.

@alexjg
Copy link

alexjg commented May 13, 2024

Oh I see what you mean and yeah I think you're right, we probably want a bundle format that supports exporting multiple documents from the repo, tagged by ID.

I wonder if such a format would also need a hint as to what each document is for. I can imagine that if you load a bundle of documents in you book-authoring app you need to know which document ID is the root document. (This is getting pretty close to what git bundle does.

@jessegrosjean
Copy link
Contributor Author

I wonder if such a format would also need a hint as to what each document is for.

I hadn't thought about that, but seems useful.

I think for NSDocument case we could also just store that metadata in NSDocument storage, outside the bundle data. But to support the more general case of exporting browser repo state using the bundle format and importing it elsewhere, then it probably becomes more required. Other option being to loop over all imported documents and inspect state.

I don't think the repo would know what metadata to store, so the API would need to look something like this, when API client provides the metadata?

// computer 1
bundle = repo.export(ids, metadata)

// computer 2
(ids, metadata) = repo.load(bundle)

@alexjg
Copy link

alexjg commented May 13, 2024

Well, one way to avoid this problem of providing metadata is to say that the metadata must be an automerge document which links to every other automerge document in the bundle (directly or transitively). A document "links" to another document by containing somewhere within it a string of which looks like an automerge URL. This way, you just provide the "metadata" document (which might be something like a table of contents) to the export function and then repo can do the work. Then the application just assumes that the root document you are loading is the metadata format it is expecting and if it isn't, it complains to the user.

The API then would look something like:

//computer 1
bundle = repo.export(<id of table of contents>)

//computer 2
toc = repo.load(bundle)

One of the nice properties of this approach is that bundles can be composed of other bundles (by linking to their root documents) without having to do some application level plumbing to carry the metadata around.

Implementing this requires more work on the part of repo though, primarily in that Repo has to scan through all the (transitively) linked documents from the root document looking for automerge URLs. This will get easier as I start implementing collection sync because I will be adding a primitive type for "links" and utility functions for loading them, but right now this might be a bit of schlep.

@jessegrosjean
Copy link
Contributor Author

Well, one way to avoid this problem of providing metadata is to say that the metadata must be an automerge document which links to every other automerge document in the bundle (directly or transitively). A document "links" to another document by containing somewhere within it a string of which looks like an automerge URL.

This sounds like it could work pretty well for my case.

Finding/following links seems a little imprecise, but workable. I think maybe you would need a client provided filter to allow skipping links. For example in the book composed of Automerge.Document paragraphs example... If a paragraph has an automerge link to some other object, you might (or might not) want to embed that content into the bundle.

Also you might want some options to indicate what should happen in case of an automerge url that doesn't exist in local repo storage. Should the repo do a find to search peers, or skip if it doesn't exist locally.

Anyway, I think this could generally work for NSDocument case. I don't really know if filtering options are needed, just a thought.

@heckj
Copy link
Collaborator

heckj commented May 13, 2024

I'm very loathe to remove the ability to import/create a document with an ID. The ID is the only indicator that "this is the same document/documents can be 'safely' merged" that we have. Once they're just blobs on the filesystem with arbitrary human provided names, the questions of "can this merge safely" cant easily be determined, which really bugs me.

If we had an API means to verify that two documents shared a base history, that would alleviate a huge amount of this issue - then we could explicitly check, and not have the follow-along data to infer that information from.

While it's far less relevant in the JS repo, I'm thinking that Jesse's idea of having a separate find() like mechanism that only works against a filesystem location (loading from a bundle, for example) would be a huge benefit for the native app use case.

@alexjg
Copy link

alexjg commented May 13, 2024

I completely agree that the ID is extremely useful for ensuring that merge behavior makes sense. My concern is that allowing people to specify the document ID actually increases the scope for this occurring and it's why I'm suggesting that the only way to load a document from outside of the repo storage be this bundle format, where the bundle includes the document ID in it so that there is no way of the user accidentally associating the wrong ID with the document.

Does the idea of a bundle format address your concerns around provenance?

@heckj
Copy link
Collaborator

heckj commented May 13, 2024

Mostly. It certainly does for the bundle "load from filesystem" scenario, and is well supported in the Apple scenarios. The place where it isn't as well supported is in mechanisms like "drag and drop", and the new variations that are appearing on the Apple platform where you have a type identifier linked up with a blob of serialized data - that scenario doesn't give you a means to say "these multiple separate things (id, doc) always go together" like the bundle does. In that scenario, I've been leaning into encoded the doc + id into a single thing, to the point where I formalized that in the API as the type DocHandle

I'd be happy to make repo create API take only DocHandle or the data from a doc, as a means of transporting these two values together, as that keeps this identifier alongside, but downplays an API that might encourage you to create a conflicting ID scenario.

The URL concept is a bit more fraught in Apple's platforms, as the kind of URL support that Automerge-repo is using is typically bound to a single app, so not at all well supported as a general platform. Apple's platform has basically taken that concept and made it relevant to "deep link" into an application, as opposed to having multiple apps that can all use the same scheme (which is also causing a headache in other areas - RSS in particular - and the whole concept of "default apps", and the lack of control for such a concept in iOS, but, that's a whole separate Apple annoyance issue).

The whole need here to keep an id is, in my mind, really only relevant to knowing if a document can be safely merged. To be honest, I'd very much prefer to have a way to remove the need for an ID at all - at least as that primary indicator, and switch entirely to something we can introspect on the data of an Automerge document to know if there's a shared base history.

@alexjg
Copy link

alexjg commented May 14, 2024

Interesting, would you be able to point me at the best thing to read to understand what data you do get in a "drag and drop" scenario? Or better yet, explain how you imagine this currently working in the concrete case of something like MeetingNotes? I don't have a clear picture in my head of what the platform is managing for us and what we are providing.

Re. URLs. I don't imagine automerge URLs as being user facing in general. As you say, in practice user facing URLs are typically app specific on most modern platforms. I still think it's worth using automerge URLs internally though. URLs have nice standardised extensibility mechanisms which we will almost certainly need - and occasionally they will be useful as user facing things.

To the last point - I agree that we should definitely have tools for determining if two documents have a shared history and we probably will do something like add a native document ID concept. However, there are still lots of cases where you actually want documents with shared history to be handled separately by the persistence and transport mechanisms - e.g. I might have a private version of my meeting notes, which includes a whole bunch of information which I don't share in the public version, but they start from a common base. I think this means we will always need an ID and it will probably always need to be part of the application layer API - unless we can find a way to subsume all privacy and visibility APIs into a layer which is independent of the application, which seems like a project that might take some time.

@heckj
Copy link
Collaborator

heckj commented May 14, 2024

Happy to help provide references:

For the drag and drag, one-binary-thing I was mentioning, the bits that Apple started pushing is called "Transferrable". I wish there was more to read on it, but there's a WWDC video that provides an overview (14min long), as well as some minimal API docs and then API that enables system-provided capabilities to "share" content (https://developer.apple.com/documentation/SwiftUI/ShareLink).

I don't have a solid idea of HOW to integrate Automerge with that, but was leaning torwards the idea of supporting it by sharing an Automerge doc as the type that gets serialized.

Is there a clearly defined detail about parsing out the Automerge URL format? You mentioned even potentially linking to content (inside a schema within a Doc?) using it, and I don't think I've come across anything yet other than a loose automerge:_bs58docid_ structure for the URL.

I think I have a slightly negative reaction to using them because they can be so tricky to parse correctly, and what I think of as the misalignment with the platform usage, but you're absolutely right that they're structured formats that incorporate the idea of a location (id) and can be built on from there.

@jessegrosjean
Copy link
Contributor Author

jessegrosjean commented May 14, 2024

I'm very loathe to remove the ability to import/create a document with an ID. The ID is the only indicator that "this is the same document/documents can be 'safely' merged" that we have. Once they're just blobs on the filesystem with arbitrary human provided names, the questions of "can this merge safely" cant easily be determined, which really bugs me.

I'm not sure that I understand the problem caused by a common bundle format here. I expect the bundle format could have an API that allows you to read both rootId and all included Ids. Just not write. So you could still test in code if it makes sense to merge.

In my understanding the purpose of the bundle format is just:

  1. To protect against user making random document/id associations
  2. To support custom import behavior that isn't possible in the current API

Another nice feature of the bundle format is CLI case. Currently it would be impossible to write a generic CLI tool that could inspect macOS automerge backed documents, since the ID must be stored in the format, and each app will decide how to do that differently.

With a standardized bundle format then that document/id association will be standardized in the file format. So a CLI could interact with any automerge backed app that externalizes using the package format. All down the road, but I think useful.

@heckj
Copy link
Collaborator

heckj commented May 14, 2024

No qualms with the bundle format, I like the idea and where that's going. Was thinking some on what a more useful API would be that leveraged that - like a ".load()" and some light thinking through of where the relevant base fileURL is configured and how to work with it.

My point was more that I saw value in allowing create() to specify an ID, so that you could load documents that had a serialized ID within them, even if that was a potential footgun for duplicate ID scenarios.

Adding an API to do that more safely with a bundle format makes a ton of sense.

@jessegrosjean
Copy link
Contributor Author

jessegrosjean commented May 14, 2024

I think the problem with create taking an ID is that it's the wrong verb. If you already have an ID then the document has already been created somewhere else. You can't be creating it a second time.

Instead you musts be wanting to load, which has different expectations:

  • Load should expect possibility that document already exists in storage, and will be merged.
  • Load should expect possibility that a connected peer has different document state that will be merged.

If for some reason you really did want to associate arbitrary document with an ID then the bundle runtime object could include an escape hatch constructor that allowed you to create that association. I'm still not sure when you would need that, but it would be safer there than in the main repo API.

I've been leaning into encoded the doc + id into a single thing, to the point where I formalized that in the API as the type DocHandle

I don't understand how this would be different from encoded bundle. In simple case an encoded bundle would contain the exact same info, id:document.

Edit And now that I continue to think about this I do see a place where I would need to escape hatch myself. In the NSDocument case, when you create a new document you will create the DocumentId, but you may not add to repo right away. So escape hatch is needed to associate your DocumentId and Document in repo. Still I think it would be cleaner to do that by constructing a bundle and loading it, rather then passing id into create.

@heckj
Copy link
Collaborator

heckj commented May 14, 2024

Aside from the bundle concept (which I think is a good idea, just larger than I want to try in a singular, small update), I think the changes we're leaning towards:

Keeping these:

  • create(doc) -> DocHandle, - maybe add: create(data) -> DocHandle
  • create() -> DocHandle
    Renaming:
  • create(id:doc) -> DocHandle to load(id:doc) -> DocHandle
  • create(id:data) -> DocHandle to load(id:data) -> DocHandle
    • and explicitly updating the logic to "merge if another doc already exists with that ID"

And referencing the question in #95, if a handle is in the deleted or unavailable state when loading, add in the document and convert it's state to .ready when done loading

This jive with your thinking (@jessegrosjean & @alexjg ?)

bikeshedding

I was half-thinking of reserving the verb load to something that was specific to working with a storage provider to read in files from a persistent store, but it also makes sense from the perspective of "loading docs" into a repo where you're dealing with persistence outside of the repository. What do think about the verb add or import for the spots where you're providing either Data that is an Automerge document or a Document alongside a DocumentId?

I'm thinking connect is too network-centric with overloaded implications (from earlier), and I half-want to keep load to allow a "just get this from local persistence, don't ask on the network if any is available" to go along with the Bundle format idea that's germinating.

@jessegrosjean
Copy link
Contributor Author

I think the changes we're leaning towards:

Yeah, to me that all makes sense, and would be useful right away for my demo app.

add, import, load, connect

Of those I think I like import best with load a close second.

Using load to load from storage, without asking for document from peers might be useful. But do we even have a storage implementation yet? I guess I would wait until storage is in real use before adding load command.

@heckj
Copy link
Collaborator

heckj commented May 15, 2024

(there is a storage provider setup, and a test one to work out the interactions, but nothing yet that hits a filesystem on iOS/macOS - added an issue to make one: #97

@heckj
Copy link
Collaborator

heckj commented Jul 4, 2024

resolved with implementation of import()

@heckj heckj closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API feedback Feedback or recommendations for changes to public API
Projects
None yet
Development

No branches or pull requests

3 participants