Skip to content

Latest commit

 

History

History
149 lines (117 loc) · 6.21 KB

README.md

File metadata and controls

149 lines (117 loc) · 6.21 KB

schism

A batteries included library of CRDT implementations of Clojure's core data types: Sets, Maps, Vectors, and Lists with support for distributed modification and eventual consistency.

Dependency Information

Latest release: 0.1.2

Leiningen and Boot dependency information:

[com.holychao/schism "0.1.2"]

Motivation

Clojure is one of a handful of languages which can be authored and executed in both a high performance server environment and in a web browser. This strength affords many benefits, one of which the language does not cover is allowing for concurrent modification of data with rich synchronization semantics.

There are some other efforts similar to this. Schism's aims are:

  • To minimize the locus of concerns outside of collection data structures.
  • To provide collection data structures with inferior performance to Clojure's own persistent data structures, but which only incur sub-linear cost increases, available through the same interfaces.
  • To provide collection data structures with greater storage, and serialization costs than Clojure's own persistent data structures, which grow with an upper bound of the number of elements in the collection, independent of the number of operations against that collection.
  • Allow for a Clojure and ClojureScript processes executing on separate nodes to maintain convergent data.

Limitations

Because schism refuses to use tombstones, some convergence operations will have different results than structures which will embrace the costs of tombstones.

For example, Schism's set may drop a recently added element during convergence with certain vector clock states. While this quality is undesirable, I accept it more readily than monotonically increasing storage requirements for data explicitly intended for communication between nodes. Future work may pursue allowing for the convergence operation to have some configurability so that vector clocks will retain information more eagerly to reduce the incidence of this phenomena.

Usage

schism.core contains functions for generating new collections, e.g. schism.core/convergent-set. These functions accept arguments much like their Clojure core equivalents, so (convergent-set :a :b :c) creates a set equivalent to #{:a :b :c}

These collections support Clojure's collection operations with the same semantic consequences (save for above mentioned performance and storage costs). Any divergence between a schism collection's behavior and a Clojure collection's behavior is a bug. Please file a report, as these should be relatively fast and easy to fix.

String coercion of a schism collection is identical to that of a Clojure collection, e.g. a convergent set will coerce to a string as #{:a :b :c}. However, pr-str will generate a longer tagged literal containing all synchronization data for the structure to operate correctly. Schism eagerly enables edn readers for its structures, so synchronization can be as simple as sending the results of pr-str over the wire, reading on the receiving side and converging with the receiver's local copy.

Convergence is done with schism.core/converge. It accepts two collections, which it assumes are both copies of the same replicated collection, and returns a single collection: the result of convergence. Converge replicates the metadata of its first argument into its return value, if present.

Each CLJ and CLJS process working with any shared schism collection is a node in a distributed computing cluster. Each node should have an identity in order for synchronization to operate correctly. You may invoke schism.core/initialize-node! with no arguments to initialize the current process to have a randomly generated UUID as its node identifier. You may use any serializable value as the node identifier by passing that value to initialize-node. If you can create stable node identifiers, that can lead to some minor reduction in the storage requirements for schism's data structures. If two nodes operate on a replicated collection independently and do NOT have distinct node identifiers, schism's convergence behavior is undefined. (Don't do this). If you do not explicitly invoke initialize-node!, it is left at the value nil.

Nested collections

It is common to build up a tree of maps and vectors to create several addressable values that cohere to a common whole. If one built this collection up out of individual schism collections, a number of problems with convergence and serialization would present themselves. I have provided schism.core/nested-map and schism.core/nested-vector to address both these problems. While these provide best-least-surprise convergence, it's important to understand the limitations and leverages of best:

  • These collections incur substantially more CPU time to conduct simple operations as a isomorphic mirror of all modifications must be computed on each update.
  • While adding empty collections should work, please avoid doing so.
  • clojure.core/assoc-in, clojure.core/update-in, and friends should work with the convergent map without any special handling.
  • Vectors containing leaf nodes will compose tail insertions, so two nodes adding to the tail with conj will have both of their additions retained in chronological order.
  • Vectors at intermediate nodes will treat the child at the index as identity.
  • Collections must be isomorphic to converge: Do not allow one node to place a vector and another node to place a map at the same path in the tree.

Given the additional complications I strongly encourage clients to pursue a strategy of retaining a shallow convergent-vector of entity ids and one convergent-map for each entity. For those cases where this combo is not sufficient, please wield nested-map and nested-vector with care.

Further work

  • Configurable convergence
  • Other good ideas as the community provides them

Contributing

I don't use CA's or other such things. Bugfixes are welcome and appreciated.

I reserve the right to dismiss feature requests in the guise of PRs.

All work will be evaluated in light of conformance with motivations as stated in this document.

License

Copyright © 2020 Alex Redington

Distributed under the MIT License