|
| 1 | +\documentclass[11pt]{article} |
| 2 | +\usepackage[margin=1in]{geometry} |
| 3 | +\usepackage{hyperref} |
| 4 | +\usepackage{graphicx} |
| 5 | +\usepackage{xcolor} |
| 6 | +\usepackage{listings} |
| 7 | +\usepackage{array} |
| 8 | +\setlength{\emergencystretch}{3em} |
| 9 | +\lstset{ |
| 10 | + language=[Objective]Caml, |
| 11 | + basicstyle=\ttfamily\small, |
| 12 | + keywordstyle=\color{blue}\bfseries, |
| 13 | + commentstyle=\itshape\color{gray}, |
| 14 | + stringstyle=\color{teal}, |
| 15 | + columns=fullflexible, |
| 16 | + keepspaces=true, |
| 17 | + showstringspaces=false, |
| 18 | + frame=single, |
| 19 | + breaklines=true, |
| 20 | + captionpos=b |
| 21 | +} |
| 22 | +\newcommand{\codeinline}[1]{\lstinline[breaklines=true]!#1!} |
| 23 | +\title{Turning OCaml Programs Reactive with the Skip Runtime} |
| 24 | +\author{Codex} |
| 25 | +\date{\today} |
| 26 | + |
| 27 | +\begin{document} |
| 28 | + |
| 29 | +\maketitle |
| 30 | + |
| 31 | +\begin{abstract} |
| 32 | +Reactive is a prototype OCaml library that layers the Skip runtime's persistent, incremental computation model on top of idiomatic OCaml code. |
| 33 | +This paper documents the programming model exposed by \texttt{reactive.ml}, explains how it relates to the broader Skip philosophy of reactive services, and provides practical guidance for migrating an existing OCaml program into a fully reactive pipeline. |
| 34 | +We summarize key patterns and considerations, outline operational constraints such as fixed-address linking, and describe how to validate a reactive port end-to-end. |
| 35 | +\end{abstract} |
| 36 | + |
| 37 | +\section{Introduction} |
| 38 | +Skip's native runtime was originally created to power fully reactive backend services that continuously maintain consistent views of data and stream deltas to clients. |
| 39 | +The reactive OCaml library makes the same execution model available from OCaml by exposing a lightweight API around Skip's persistent heap, deterministic forking model, and dependency tracking primitives. |
| 40 | +Rather than recomputing entire pipelines, a program declares immutable collections, composes pure maps between them, and lets the runtime cache results and re-run only the subgraphs affected by input changes. |
| 41 | + |
| 42 | +Applying the model effectively requires more than calling a few functions: developers must understand how persistent heaps are managed, how computations are scheduled, how trackers enforce disciplined I/O, and how to structure code so that reactivity can be introduced incrementally. |
| 43 | +This paper provides comprehensive documentation for working with the reactive OCaml library, covering both the conceptual foundations from Skip's reactive service philosophy and practical guidance for building reactive OCaml applications. |
| 44 | + |
| 45 | +\section{Philosophy of Reactive Computation} |
| 46 | +Skip's RFC 008 characterizes a \emph{reactive service} as a compute graph of reactive collections that continually maintains derived state and exposes it via reactive resources that can be mirrored by other services or clients. |
| 47 | +Key ideas that carry over to the OCaml binding are: |
| 48 | +\begin{itemize} |
| 49 | + \item \textbf{Declarative dependency graphs.} Developers define collections and deterministic transformations between them. |
| 50 | + Skip records the dependency edges automatically and maintains the graph across executions. |
| 51 | + \item \textbf{Persistent heaps with stable code addresses.} Cached results are stored in an on-disk heap (\texttt{.rheap}) whose format requires that code and data live at known addresses. |
| 52 | + \item \textbf{Strict control of effects.} External interactions happen through tracked resources. |
| 53 | + Anything not routed via the tracker API is invisible to the runtime and therefore unsafe. |
| 54 | + \item \textbf{Gradual adoption.} RFC 008 emphasizes wrapping existing REST resources with reactive mirrors to avoid flag-day rewrites. |
| 55 | + Likewise, OCaml applications can fence off legacy imperative code and progressively move stages behind reactive maps. |
| 56 | +\end{itemize} |
| 57 | + |
| 58 | +\section{Runtime and Library Architecture} |
| 59 | +\subsection{Persistent Heaps and Pointer Stability} |
| 60 | +OCaml objects must be linked with \texttt{libskip\_reactive.a}, which bundles the Skip runtime, supporting C helpers, and the Skip LLVM object. |
| 61 | +On Linux, binaries must be linked with \texttt{-no-pie} and a fixed text address (\texttt{-Wl,-Ttext=0x8000000}) so that function pointers remain stable across runs. |
| 62 | +macOS cannot enforce a fixed text address; consequently, persistent heaps cannot be reused across separate executions. |
| 63 | +Within a single process tree (\texttt{fork()} descendants) heaps are reusable on both platforms. |
| 64 | + |
| 65 | +Persistent heaps are opened via \texttt{Reactive.init file\_name size}, which maps or creates the heap, registers a custom exception for write violations, and prepares the runtime's guard pages. |
| 66 | +Once initialized, the runtime protects the heap with \texttt{mprotect} when entering worker processes to catch accidental writes. |
| 67 | + |
| 68 | +\subsection{Collections, Trackers, and Maps} |
| 69 | +Collections are opaque identifiers (\texttt{type 'a t = string}) that reference named directories inside the heap. |
| 70 | +Inputs are declared up front through \texttt{Reactive.input\_files}, which stores the list of file paths, synchronizes it with the cache, and returns a collection of trackers. |
| 71 | +Each tracker enforces that file reads happen through \texttt{Reactive.read\_file}; the runtime hashes file contents and remembers which map invocation consumed which tracker. |
| 72 | + |
| 73 | +\texttt{Reactive.map} prepares the computation, then forks worker processes so that pure computations run under copy-on-write semantics. |
| 74 | +Workers call the user function with a key and an immutable array of values and must return a list of key/array pairs. |
| 75 | +The runtime deduplicates output keys and produces a new collection. |
| 76 | +\texttt{marshalled\_map} wraps the same mechanism but serializes values with \texttt{Marshal.to\_string} so that closures or unsupported data types can be returned at higher cost. |
| 77 | + |
| 78 | +\subsection{Observation and Lifecycle} |
| 79 | +\texttt{Reactive.get\_array} can only be called after \texttt{Reactive.exit}. |
| 80 | +Before exiting, the runtime is still in ``graph building'' mode and will raise \texttt{Toplevel\_get\_array}. |
| 81 | +After exit, code pointers are reprotected read-only, making it safe to reuse cached values. |
| 82 | +Exiting twice is a fatal error. |
| 83 | +\texttt{Reactive.union} merges two collections into a combined one when fan-in is required. |
| 84 | + |
| 85 | +\section{Programming Model Reference} |
| 86 | +The public API surfaces the following primitives (types copied directly from \texttt{reactive.mli} for clarity): |
| 87 | +\begin{itemize} |
| 88 | + \item \textbf{\codeinline{init}}\\ |
| 89 | + \emph{type:} \codeinline{filename -> int -> unit}. Create or open a persistent heap with a fixed upper bound in bytes. Must be called before any other function. |
| 90 | + \item \textbf{\codeinline{input_files}}\\ |
| 91 | + \emph{type:} \codeinline{filename array -> tracker t}. Declare the set of input files. Skip records and sorts them; cached runs require the same set. |
| 92 | + \item \textbf{\codeinline{read_file}}\\ |
| 93 | + \emph{type:} \codeinline{filename -> tracker -> string}. Read file contents through the tracker supplied by \codeinline{input_files}, ensuring the dependency is tracked. |
| 94 | + \item \textbf{\codeinline{map}}\\ |
| 95 | + \emph{type:} \codeinline{'a t -> (key -> 'a array -> (key * 'b array) array) -> 'b t}. Apply a pure transformation to every key's values, producing a new collection. Runs under forked worker processes and cannot be called recursively. |
| 96 | + \item \textbf{\codeinline{marshalled_map}}\\ |
| 97 | + \emph{type:} \codeinline{'a t -> (key -> 'a array -> (key * 'b array) array) -> 'b marshalled t}. Variant of \codeinline{map} that serializes outputs so closures or custom types can be cached. |
| 98 | + \item \textbf{\codeinline{unmarshal}}\\ |
| 99 | + \emph{type:} \codeinline{'a marshalled -> 'a}. Deserialize values produced by \codeinline{marshalled_map}. |
| 100 | + \item \textbf{\codeinline{get_array}}\\ |
| 101 | + \emph{type:} \codeinline{'a t -> key -> 'a array}. Access cached arrays after \codeinline{exit}. Calling it earlier raises \codeinline{Toplevel_get_array}. |
| 102 | + \item \textbf{\codeinline{union}}\\ |
| 103 | + \emph{type:} \codeinline{'a t -> 'a t -> 'a t}. Merge two collections, useful when fusing branches or joining independent maps. |
| 104 | + \item \textbf{\codeinline{exit}}\\ |
| 105 | + \emph{type:} \codeinline{unit -> unit}. Seal the heap, flush caches, and transition to observation mode. Required before calling \codeinline{get_array}. |
| 106 | +\end{itemize} |
| 107 | + |
| 108 | +\section{Migrating an OCaml Program} |
| 109 | +The easiest way to make an existing binary reactive is to follow a disciplined set of transformation steps. |
| 110 | + |
| 111 | +\subsection{Audit Inputs and Effects} |
| 112 | +\begin{enumerate} |
| 113 | + \item \textbf{Identify stable inputs.} Files, command-line data, or database exports that should trigger incremental invalidation become entries in the array passed to \texttt{input\_files}. |
| 114 | + \item \textbf{Fence off side effects.} Anything that relies on wall-clock time, random numbers, network calls, or mutable globals must either be converted into deterministic data or moved outside the reactive pipeline. |
| 115 | + \item \textbf{Design trackers.} Each file read should map to one or more trackers so that the runtime knows when to re-run a node. |
| 116 | +\end{enumerate} |
| 117 | + |
| 118 | +\subsection{Stage Computations into Maps} |
| 119 | +Walk the original pipeline and wrap each pure stage in its own \texttt{Reactive.map}: |
| 120 | +\begin{itemize} |
| 121 | + \item \textbf{Reader maps} use \texttt{read\_file} exactly once per tracker and emit the parsed representation. |
| 122 | + \item \textbf{Transformation maps} accept the normalized data and compute derived metrics, such as uppercasing and reversing text, or fanning out words into length buckets. |
| 123 | + \item \textbf{Aggregators} combine branches via \texttt{union} or by emitting multiple keys per input. |
| 124 | +\end{itemize} |
| 125 | +Because maps run out-of-process, they must avoid capturing mutable OCaml state except through their arguments. |
| 126 | +Closures can be emitted only through \texttt{marshalled\_map}. |
| 127 | +Raw closures passed through regular \texttt{map} will be rejected by the runtime. |
| 128 | + |
| 129 | +\subsection{Exit and Observe} |
| 130 | +Once the graph is declared, call \texttt{Reactive.exit()}. |
| 131 | +Only after exiting can downstream code fetch arrays and integrate with non-reactive subsystems (database writes, HTTP responses, etc.). |
| 132 | +Attempting to call \texttt{get\_array} before exiting will raise \texttt{Toplevel\_get\_array}. |
| 133 | +Forking after exit is allowed and lets children reuse cached data without reopening the heap, as long as the parent remains alive. |
| 134 | + |
| 135 | +\subsection{Worked Transformation} |
| 136 | +Listing~\ref{lst:transformation} sketches how an imperative file-processing script can be rewritten. |
| 137 | + |
| 138 | +\begin{lstlisting}[caption={From imperative to reactive},label={lst:transformation}] |
| 139 | +(* Imperative baseline *) |
| 140 | +let summarize files = |
| 141 | + files |
| 142 | + |> Array.map (fun file -> |
| 143 | + let content = Stdlib.input_all (open_in file) in |
| 144 | + let metrics = analyze content in |
| 145 | + (file, metrics)) |
| 146 | + |
| 147 | +(* Reactive version *) |
| 148 | +let summarize_reactive files = |
| 149 | + Reactive.init "analysis.rheap" (512 * 1024 * 1024); |
| 150 | + let inputs = Reactive.input_files files in |
| 151 | + let parsed = |
| 152 | + Reactive.map inputs (fun key trackers -> |
| 153 | + let raw = Reactive.read_file key trackers.(0) in |
| 154 | + [| (key, [| parse raw |]) |]) |
| 155 | + in |
| 156 | + let metrics = |
| 157 | + Reactive.map parsed (fun key arr -> |
| 158 | + let summary = analyze arr.(0) in |
| 159 | + [| (key, [| summary |]) |]) |
| 160 | + in |
| 161 | + Reactive.exit (); |
| 162 | + Array.map (fun file -> (file, Reactive.get_array metrics file)) files |
| 163 | +\end{lstlisting} |
| 164 | + |
| 165 | +\section{Common Patterns and Considerations} |
| 166 | +\paragraph{Multi-file fan-out} When processing multiple input files, map functions should validate that output keys can differ from input keys. For large workloads, allocate sufficiently large heaps during initialization. |
| 167 | +\paragraph{Key management} Maps should treat the key argument as the canonical identifier when emitting derived keys to maintain consistency across stages. |
| 168 | +\paragraph{Dependent stages} Building pipelines with multiple dependent stages (e.g., word extraction followed by length calculation) requires careful reuse of keys at different logical layers. |
| 169 | +\paragraph{Nested maps} Calling \texttt{map} inside another \texttt{map} is not supported. Nested reactive graphs must be flattened or expressed through separate top-level maps. |
| 170 | +\paragraph{Heap reuse limitations} On macOS, cached heaps cannot be reopened by a fresh process; cleanup scripts should delete \texttt{*.rheap} files after each run unless debugging requires preserving them. |
| 171 | +\paragraph{Tracker discipline} Every file read must pass through the tracker array supplied by \texttt{input\_files}; ad-hoc I/O violates dependency tracking and will compromise the reactive guarantees. |
| 172 | + |
| 173 | +\section{Operational Playbook} |
| 174 | +\subsection{Build and Link} |
| 175 | +\begin{itemize} |
| 176 | + \item Build the reactive library to produce \texttt{reactive.cmxa} and \texttt{libskip\_reactive.a}. |
| 177 | + Linking your own program requires including \texttt{-cclib -lstdc++}. |
| 178 | + \item macOS binaries rely on custom Mach-O segments; Linux requires explicit linker flags to disable PIE and fix the text segment. |
| 179 | + \item The runtime bundles its own \texttt{main}; \texttt{runtime64\_specific.cpp} strips symbols via \texttt{objcopy} when building the static library. |
| 180 | +\end{itemize} |
| 181 | + |
| 182 | +\subsection{Process Discipline} |
| 183 | +Skip relies on \texttt{fork()} to isolate worker maps and to terminate them if the parent exits. |
| 184 | +Never call \texttt{map} while already inside another \texttt{map}; the runtime tracks this via the \texttt{toplevel} flag and raises \texttt{Can\_only\_call\_map\_at\_toplevel}. |
| 185 | +Because \texttt{fork()} duplicates the process image, the code must remain single-threaded (no multicore OCaml runtime) and should avoid holding OS resources open across map boundaries unless they are read-only descriptors. |
| 186 | + |
| 187 | +\subsection{Heap Hygiene} |
| 188 | +On macOS, always delete stale heaps between executions; the runtime will otherwise exit with an error requesting manual cleanup. |
| 189 | +On Linux, heaps can be reused across program restarts as long as the binary layout is unchanged. |
| 190 | +Use cleanup scripts that delete \texttt{*.rheap} files by default and offer a keep-heaps option for debugging. |
| 191 | + |
| 192 | +\subsection{Testing Strategy} |
| 193 | +Reactive unit tests are regular OCaml binaries that link against the reactive library. |
| 194 | +Each test should link with \texttt{reactive.cmxa} and the static runtime library. |
| 195 | +Test harnesses should check for expected failures such as child processes intentionally aborting. |
| 196 | +Mirroring this workflow in downstream projects ensures regressions are caught early, especially around platform-specific invariants. |
| 197 | + |
| 198 | +\section{Best Practices and Anti-Patterns} |
| 199 | +\begin{itemize} |
| 200 | + \item \textbf{Always initialize once.} The runtime tracks whether \texttt{init} has run and refuses to proceed otherwise. |
| 201 | + \item \textbf{Respect tracker usage.} Use the tracker array supplied to \texttt{map}; do not allocate new file handles or call \texttt{read\_file} without the matching tracker. |
| 202 | + \item \textbf{Emit immutable data.} Values returned from \texttt{map} are assumed immutable. |
| 203 | + Modifying them afterward leads to undefined behavior because multiple keys may share the same cached array. |
| 204 | + \item \textbf{Use \texttt{marshalled\_map} sparingly.} Serialization defeats structural sharing and increases heap footprint. |
| 205 | + Prefer encoding results as primitive data. |
| 206 | + \item \textbf{Expose deterministic keys.} Keys determine cache reuse. |
| 207 | + If keys depend on the execution environment (timestamps, random numbers), the runtime will never hit its cache. |
| 208 | + \item \textbf{Guard the imperative boundary.} After \texttt{exit}, copy data out before mutating it, especially when handing arrays to legacy code. |
| 209 | +\end{itemize} |
| 210 | + |
| 211 | +\section{Future Directions} |
| 212 | +Bringing Skip's full reactive service model to OCaml would involve exposing replication tokens, diff streams, and authentication mechanisms described in RFC 008. |
| 213 | +The current prototype already models collections as DAG nodes; adding APIs for \texttt{diff} and \texttt{mirror} would let OCaml programs act as first-class reactive resources inside a larger Skip deployment. |
| 214 | +Another avenue is improving developer ergonomics by generating \texttt{map}-heavy boilerplate or by offering lint rules that detect nested map attempts or unchecked \texttt{get\_array} calls. |
| 215 | + |
| 216 | +\section{Conclusion} |
| 217 | +Reactive OCaml offers a practical path toward incrementalizing existing workloads by reusing the Skip runtime's proven abstractions. |
| 218 | +By enforcing disciplined I/O through trackers, executing pure maps under forked workers, and persisting results into stable heaps, applications can scale to large data sets while avoiding redundant recomputation. |
| 219 | +The programming model provides a template for structuring pipelines, while Skip's broader reactive service philosophy illustrates how those pipelines integrate into end-to-end systems. |
| 220 | +With careful adherence to the guidelines in this document, developers can confidently port OCaml code to a reactive architecture that is both efficient and predictable. |
| 221 | + |
| 222 | +\end{document} |
0 commit comments