spec: Granular Concurrency #407

joeltg · 2024-11-25T18:52:21Z

Experience with "clobber"-genre bugs and theoretical results from distributed computing have led us to an understanding that db.get, even when implemented "deterministically" using our primitive time-travel functionality, still results in unexpected and undesirable behavior at the application-semantics level. In simple terms, db.get is not in general a safe operation for multiple actions to use concurrently.

A clearer understanding of the theoretical limits of distributed programs shows that most non-trivial programs do not have perfectly coordination-free implementations, but that the coordination they do require can often be identified and isolated.

This suggests that, for Canvas, we need to develop a way for peers to invoke coordination when needed, and a way to update coordination-free parts of program state. In practice, "coordination" means either a) blocking (e.g. waiting for consensus) or b) rollback.

Global coordination can be concretely understood in the context of Canvas apps as a requirement that, in order to safely read from or write to arbitrary global state, an action must "bottleneck" causal history - it must include every previous action as a transitive ancestor, and every subsequent action must descend from it. There can be no concurrent threads going "around" it. A coordination mechanism is anything that enforces this.

If every action requires global coordination, then the result is equivalent to a blockchain. However, there are two other cases: actions that require no coordination, and actions that require only "local" coordination.

Actions that require no coordination are equivalent to pure CRDT deltas. If an application has some state that can be accurately represented as a CRDT, then there are no constraints on the concurrency of updates to that state. However, as pointed out in research literature, CRDTs are safe to concurrently update but are not safe to read from ("Schrödinger consistency"). For LWW registers, this means a coordination-free action can only set values and not get them. We can add "patch" CRDTs for property-wise LWW behavior, but still no general "get".

"No coordination" and "global coordination" are really two extreme degenerate cases of the general question "how much coordination does this action require to safely execute". In practice, actions that read from the database and perform some writes as a result are not generally accessing the entire database state, only individual records or individual tables. A single db.set(tablename, id) writes from one individual record, and only needs to bottleneck other action reading or writing to that same record. A db.count(tablename) "reads" from the entire table, and only needs to bottleneck other actions writing to anything in that table.

This observation is the basis of "snapshot isolation", the default consistency level in Postgres. Postgres tracks, to as granular a degree as possible, which rows each open transaction is accessing, and allows multiple concurrent transactions to commit writes as long as their effects don't intersect.

To flexible support the whole range outlined above, Canvas apps need to distinguish between "CRDT state" and "normal relation state". Any action that only emits effects on CRDT state can execute with unconstrained concurrency. Simple chat apps or pure YJS document editors can be implemented like this. Actions that read from or write to the normal relational state have concurrency constraints that are enforced by some to-be-determined coordination protocol.

I think this requires static configuration in the database schema (to declare which CRDT registers exist), but that action implementations don't have to statically declare what they access - we can just execute every action and track what non-CRDT mutable state it interacts with, if any.

My understanding is that this is actually extremely similar to how db.lock is supposed to work for pogiciv etc, except it automatically tracks state access and doesn't require any db.lock calls in the action itself.

The text was updated successfully, but these errors were encountered:

raykyri · 2024-11-26T15:01:33Z

In the current rollback design, the $effects table is annotated with metadata enumerating what reads an action is dependent on, and changes to a contentious/locking record trigger a forward scan that resets any effects. How is accessed state tracked, e.g. as records hash(name, pk)?
Does this require adding an ACL to CRDTs - i.e. total isolation between non-contentious actions? Can many CRDT writes read from a coordination point? Is there a grow-only record class that can be used for reads before CRDT actions?

joeltg · 2024-11-26T15:35:16Z

Same or similar, every action would have to trace and store a representation of its reads and writes
No CRDT writes can read from anything (or if they do it makes the action require coordination). Reads from CRDT states count as regular reads, ie requiring coordination

joeltg · 2024-11-26T15:37:52Z

and yeah, if we identify a class of "safely readable state" like an immutable CID store we could add that - but the immutable CID store would requiring throwing an exception for missing CIDs, and never returning null

raykyri added the support label Dec 14, 2024

raykyri added prop-1 and removed support labels Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec: Granular Concurrency #407

spec: Granular Concurrency #407

joeltg commented Nov 25, 2024

raykyri commented Nov 26, 2024

joeltg commented Nov 26, 2024

joeltg commented Nov 26, 2024

spec: Granular Concurrency #407

spec: Granular Concurrency #407

Comments

joeltg commented Nov 25, 2024

raykyri commented Nov 26, 2024

joeltg commented Nov 26, 2024

joeltg commented Nov 26, 2024