Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: Granular Concurrency #407

Open
joeltg opened this issue Nov 25, 2024 · 3 comments
Open

spec: Granular Concurrency #407

joeltg opened this issue Nov 25, 2024 · 3 comments
Labels

Comments

@joeltg
Copy link
Contributor

joeltg commented Nov 25, 2024

Experience with "clobber"-genre bugs and theoretical results from distributed computing have led us to an understanding that db.get, even when implemented "deterministically" using our primitive time-travel functionality, still results in unexpected and undesirable behavior at the application-semantics level. In simple terms, db.get is not in general a safe operation for multiple actions to use concurrently.

A clearer understanding of the theoretical limits of distributed programs shows that most non-trivial programs do not have perfectly coordination-free implementations, but that the coordination they do require can often be identified and isolated.

This suggests that, for Canvas, we need to develop a way for peers to invoke coordination when needed, and a way to update coordination-free parts of program state. In practice, "coordination" means either a) blocking (e.g. waiting for consensus) or b) rollback.

Global coordination can be concretely understood in the context of Canvas apps as a requirement that, in order to safely read from or write to arbitrary global state, an action must "bottleneck" causal history - it must include every previous action as a transitive ancestor, and every subsequent action must descend from it. There can be no concurrent threads going "around" it. A coordination mechanism is anything that enforces this.

If every action requires global coordination, then the result is equivalent to a blockchain. However, there are two other cases: actions that require no coordination, and actions that require only "local" coordination.

Actions that require no coordination are equivalent to pure CRDT deltas. If an application has some state that can be accurately represented as a CRDT, then there are no constraints on the concurrency of updates to that state. However, as pointed out in research literature, CRDTs are safe to concurrently update but are not safe to read from ("Schrödinger consistency"). For LWW registers, this means a coordination-free action can only set values and not get them. We can add "patch" CRDTs for property-wise LWW behavior, but still no general "get".

"No coordination" and "global coordination" are really two extreme degenerate cases of the general question "how much coordination does this action require to safely execute". In practice, actions that read from the database and perform some writes as a result are not generally accessing the entire database state, only individual records or individual tables. A single db.set(tablename, id) writes from one individual record, and only needs to bottleneck other action reading or writing to that same record. A db.count(tablename) "reads" from the entire table, and only needs to bottleneck other actions writing to anything in that table.

This observation is the basis of "snapshot isolation", the default consistency level in Postgres. Postgres tracks, to as granular a degree as possible, which rows each open transaction is accessing, and allows multiple concurrent transactions to commit writes as long as their effects don't intersect.


To flexible support the whole range outlined above, Canvas apps need to distinguish between "CRDT state" and "normal relation state". Any action that only emits effects on CRDT state can execute with unconstrained concurrency. Simple chat apps or pure YJS document editors can be implemented like this. Actions that read from or write to the normal relational state have concurrency constraints that are enforced by some to-be-determined coordination protocol.

I think this requires static configuration in the database schema (to declare which CRDT registers exist), but that action implementations don't have to statically declare what they access - we can just execute every action and track what non-CRDT mutable state it interacts with, if any.

My understanding is that this is actually extremely similar to how db.lock is supposed to work for pogiciv etc, except it automatically tracks state access and doesn't require any db.lock calls in the action itself.

@raykyri
Copy link
Member

raykyri commented Nov 26, 2024

  • In the current rollback design, the $effects table is annotated with metadata enumerating what reads an action is dependent on, and changes to a contentious/locking record trigger a forward scan that resets any effects. How is accessed state tracked, e.g. as records hash(name, pk)?
  • Does this require adding an ACL to CRDTs - i.e. total isolation between non-contentious actions? Can many CRDT writes read from a coordination point? Is there a grow-only record class that can be used for reads before CRDT actions?

@joeltg
Copy link
Contributor Author

joeltg commented Nov 26, 2024

  1. Same or similar, every action would have to trace and store a representation of its reads and writes
  2. No CRDT writes can read from anything (or if they do it makes the action require coordination). Reads from CRDT states count as regular reads, ie requiring coordination

@joeltg
Copy link
Contributor Author

joeltg commented Nov 26, 2024

and yeah, if we identify a class of "safely readable state" like an immutable CID store we could add that - but the immutable CID store would requiring throwing an exception for missing CIDs, and never returning null

@raykyri raykyri added prop-1 and removed support labels Jan 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants