Skip to content

jepsen-io/capela

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jepsen.capela

Tests for the Capela distributed programming environment.

Installation

You'll need a Jepsen environment to run these tests.

You'll also need a Capela tarball, which should be named something like whatever-<version>.tar.gz. That tarball should contain (either at the top level, or in a single directory):

bin/
  uvmc        The UVM compiler
  uvm_repl    The UVM server
sandbox/      A working directory with `.py` files.

Usage

To run a single test of Capela, try

lein run test --tarball capela-1.0.tar.gz

Depending on your environment, you may need to specify a username and what nodes you'd like to run with:

lein run test --tarball capela-1.0.tar.gz --username admin --nodes n1,n2,n3

To inject process kills and also single-bit errors in disk files, try

lein run test --tarball capela-1.0.tar.gz --nemesis kill,bitflip-file-chunks

There are lots of parameters to select workloads, faults, timing, concurrency, request rate, transaction structure, and more. Use lein run test --help to see a list of all the options. To run a suite of tests with various combinations of those choices, use test-all:

lein run test-all --tarball capela.tar.gz --time-limit 300 --concurrency 2n --rate 20

Passing a --workload, --nemesis, or --lazyfs to test-all will run just combinations with that particular workload, nemesis, etc.

Running a single test produces a directory in store/<name>/<timestamp>/. The currently running (or most recently run) test is in store/current, and the most recently completed test is store/latest. You can slice and dice the test.jepsen files at the repl. Running lein run serve will launch a web server displaying all the tests in the store/ directory.

Workloads

Workloads live in src/jepsen/capela/workload/, and their names are set in src/jepsen/capela/cli.clj. The workloads are:

wr: Transactions over write-read registers. Stores a map of integer keys in a dictionary in a single partition. Performs transactions which can read and write values of specific keys, and checks for various isolation anomalies using Elle

append: Like wr, but values are lists of unique integers, and transactions append to those lists, rather than overwriting their value. This is a good deal more precise than wr, but (surprisingly!) it fails to catch some bugs, like Lost Update, that wr finds.

multi-wr: Like wr, but shards values across multiple partitions.

multi-append: Like append, but shards values across multiple partitions.

ad_hoc: This runs a series of hand-coded Python programs against the query endpoint, and compares their results to what normal Python returns.

gen_py An experimental test which generates (very simple) Python programs, submits them to Capela, and checks them against a local Python interpreter.

side_effects : A sketch of a non-functional test for side effects. Capela's side effects system was not ready during our collaboration, but this might be a useful foundationf or later.

Nemeses

Nemeses are controlled by jepsen.capela.cli, with support from Jepsen's jepsen.nemesis.combined and this test's jepsen.capela.nemesis. Provided nemeses are:

kill: Kills Capela processes and restarts them. With --lazyfs, also drops un-fsynced writes.

pause: Pauses processes and resumes them using SIGSTOP and SIGCONT.

partition: Partitions the network in various topologies, using iptables.

packet: Introduces a small amount of latency into network packets.

clock: Adjust node clocks, either in a big jump, or strobing rapidly between two values.

bitflip-file-chunks: Introduces single-bit errors into .sst and .blob files in Capela's data directory.

snapshot-file-chunks: Takes snapshots of, and later restores, chunks of .sst and .blob files.

What's Here

The Leiningen project file, which pulls in dependencies and controls what gets run when you say lein run, is project.clj. The source code for the test harness lives in src; one file per namespace. The .py programs that we upload to each Capela node at the start of the test are in resources/.

jepsen.capela.cli is the top-level entry point; it parses CLI options, builds a test map, and asks Jepsen to run one or more tests.

jepsen.capela.core provides common fundamentals--mainly port numbers.

jepsen.capela.db handles installing Capela and its prerequisites, killing and pausing nodes, and downloading log files. It also includes a watchdog that restarts Capela when it crashes.

jepsen.capela.client makes HTTP calls to Capela's API, and offers some basic error handling.

jepsen.capela.nemesis defines fault injection packages. It glues together the standard packages in jepsen.nemesis.combined with some custom packages, like file corruption.

jepsen.capela.repl is the namespace you're dropped into for lein repl. It pulls in some namespaces that are handy for working with tests.

jepsen.capela.workload contains the various workloads the test can run.

Future Ideas

wr and append tests are great at finding anomalies, but there are lots of ways to encode them that may discover different parts of Capela's internals. For example, we might store one value per partition, and create partitions dynamically throughout the test. We could use alternative data structures, like an array of values, rather than a map of integer keys to values. We could use tuples instead of lists.

We could add tests for sets--either using Jepsen's set tests, or by using subset relations to infer version orders, and feeding those to Elle. Values could be stored in Python Sets and Dictionaries.

There are some hints that Capela's partitions might disappear after creation, returning None from calls to select(). We should write a test which creates partitions dynamically and reads some or all partitions back throughout the test, and pass that to Jepsen's set-full checker to make sure partitions are always available after creation.

Since Capela allows us to write our own custom data structures, we can do some neat tricks. For example, we could make up an arbitrary datatype T, like a directed graph where nodes are Capela partitions, and each has outbound edges, and operations can mutate, read, traverse the graph, etc. Then augment T with an append-only list of integers L; call the product [T, L] U. Now generate random operations on T, and number each operation uniquely. Submit those operations to some instance of U, such that each operation is applied to its T, and its unique ID is appended to its L log.

From the log, we can reconstruct the exact sequence of operations that Capela thinks occurred. Compare that to what Jepsen thinks--make sure that every acknowledged op is in the log, that every op in the log is either :ok or :info, and so on. Next, build the realtime order over operations from Jepsen's history, and ensure it's consistent with the log order; this detects realtime ordering violations. Finally, use the log order to replay the same operations against a reference implementation of T, and compare the results at each step to what Capela returned.

In short, the list-append tests convince us that Capela can reliably append things to a list. We then use the correctness of list-append in Capela as a fulcrum to test arbitrary datatypes, while ensuring the verification time remains down in linear (OK, fine, N log N) time. Point is, it's not NP like it would normally be!

We also have some fairly sophisticated queue analysis code in Jepsen already, intended for Kafka-style systems. We could implement the basic Kafka-style API inside of Capela: append something to a totally ordered log and get an offset for it, subscribe or assign yourself to a list, and poll elements from the log. This is fairly straightforward to model, and gives us some nice visualizations for if elements are lost or reordered. I suspect it's mostly redundant with list-append, but we might see interesting things from the subscription-management side of things.

License

Copyright © Jepsen, LLC

This program and the accompanying materials are made available under the terms of the Eclipse Public License 2.0 which is available at http://www.eclipse.org/legal/epl-2.0.

This Source Code may also be made available under the following Secondary Licenses when the conditions for such availability set forth in the Eclipse Public License, v. 2.0 are satisfied: GNU General Public License as published by the Free Software Foundation, either version 2 of the License, or (at your option) any later version, with the GNU Classpath Exception which is available at https://www.gnu.org/software/classpath/license.html.

About

Jepsen tests for the Capela distributed programming environment

Resources

License

Stars

Watchers

Forks

Packages

No packages published