Skip to content

Commit a79d56d

Browse files
committed
Adde guide to reactive programming with OCaml.
1 parent c4e97d1 commit a79d56d

File tree

3 files changed

+224
-2
lines changed

3 files changed

+224
-2
lines changed

.gitignore

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,5 +103,5 @@ package.tgz
103103

104104
# AI Agents
105105
.claude/settings.local.json
106-
dead_code_analysis.aux
107-
dead_code_analysis.toc
106+
*.aux
107+
*.toc

docs/reactive_ocaml.tex

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
\documentclass[11pt]{article}
2+
\usepackage[margin=1in]{geometry}
3+
\usepackage{hyperref}
4+
\usepackage{graphicx}
5+
\usepackage{xcolor}
6+
\usepackage{listings}
7+
\usepackage{array}
8+
\setlength{\emergencystretch}{3em}
9+
\lstset{
10+
language=[Objective]Caml,
11+
basicstyle=\ttfamily\small,
12+
keywordstyle=\color{blue}\bfseries,
13+
commentstyle=\itshape\color{gray},
14+
stringstyle=\color{teal},
15+
columns=fullflexible,
16+
keepspaces=true,
17+
showstringspaces=false,
18+
frame=single,
19+
breaklines=true,
20+
captionpos=b
21+
}
22+
\newcommand{\codeinline}[1]{\lstinline[breaklines=true]!#1!}
23+
\title{Turning OCaml Programs Reactive with the Skip Runtime}
24+
\author{Codex}
25+
\date{\today}
26+
27+
\begin{document}
28+
29+
\maketitle
30+
31+
\begin{abstract}
32+
Reactive is a prototype OCaml library that layers the Skip runtime's persistent, incremental computation model on top of idiomatic OCaml code.
33+
This paper documents the programming model exposed by \texttt{reactive.ml}, explains how it relates to the broader Skip philosophy of reactive services, and provides practical guidance for migrating an existing OCaml program into a fully reactive pipeline.
34+
We summarize key patterns and considerations, outline operational constraints such as fixed-address linking, and describe how to validate a reactive port end-to-end.
35+
\end{abstract}
36+
37+
\section{Introduction}
38+
Skip's native runtime was originally created to power fully reactive backend services that continuously maintain consistent views of data and stream deltas to clients.
39+
The reactive OCaml library makes the same execution model available from OCaml by exposing a lightweight API around Skip's persistent heap, deterministic forking model, and dependency tracking primitives.
40+
Rather than recomputing entire pipelines, a program declares immutable collections, composes pure maps between them, and lets the runtime cache results and re-run only the subgraphs affected by input changes.
41+
42+
Applying the model effectively requires more than calling a few functions: developers must understand how persistent heaps are managed, how computations are scheduled, how trackers enforce disciplined I/O, and how to structure code so that reactivity can be introduced incrementally.
43+
This paper provides comprehensive documentation for working with the reactive OCaml library, covering both the conceptual foundations from Skip's reactive service philosophy and practical guidance for building reactive OCaml applications.
44+
45+
\section{Philosophy of Reactive Computation}
46+
Skip's RFC 008 characterizes a \emph{reactive service} as a compute graph of reactive collections that continually maintains derived state and exposes it via reactive resources that can be mirrored by other services or clients.
47+
Key ideas that carry over to the OCaml binding are:
48+
\begin{itemize}
49+
\item \textbf{Declarative dependency graphs.} Developers define collections and deterministic transformations between them.
50+
Skip records the dependency edges automatically and maintains the graph across executions.
51+
\item \textbf{Persistent heaps with stable code addresses.} Cached results are stored in an on-disk heap (\texttt{.rheap}) whose format requires that code and data live at known addresses.
52+
\item \textbf{Strict control of effects.} External interactions happen through tracked resources.
53+
Anything not routed via the tracker API is invisible to the runtime and therefore unsafe.
54+
\item \textbf{Gradual adoption.} RFC 008 emphasizes wrapping existing REST resources with reactive mirrors to avoid flag-day rewrites.
55+
Likewise, OCaml applications can fence off legacy imperative code and progressively move stages behind reactive maps.
56+
\end{itemize}
57+
58+
\section{Runtime and Library Architecture}
59+
\subsection{Persistent Heaps and Pointer Stability}
60+
OCaml objects must be linked with \texttt{libskip\_reactive.a}, which bundles the Skip runtime, supporting C helpers, and the Skip LLVM object.
61+
On Linux, binaries must be linked with \texttt{-no-pie} and a fixed text address (\texttt{-Wl,-Ttext=0x8000000}) so that function pointers remain stable across runs.
62+
macOS cannot enforce a fixed text address; consequently, persistent heaps cannot be reused across separate executions.
63+
Within a single process tree (\texttt{fork()} descendants) heaps are reusable on both platforms.
64+
65+
Persistent heaps are opened via \texttt{Reactive.init file\_name size}, which maps or creates the heap, registers a custom exception for write violations, and prepares the runtime's guard pages.
66+
Once initialized, the runtime protects the heap with \texttt{mprotect} when entering worker processes to catch accidental writes.
67+
68+
\subsection{Collections, Trackers, and Maps}
69+
Collections are opaque identifiers (\texttt{type 'a t = string}) that reference named directories inside the heap.
70+
Inputs are declared up front through \texttt{Reactive.input\_files}, which stores the list of file paths, synchronizes it with the cache, and returns a collection of trackers.
71+
Each tracker enforces that file reads happen through \texttt{Reactive.read\_file}; the runtime hashes file contents and remembers which map invocation consumed which tracker.
72+
73+
\texttt{Reactive.map} prepares the computation, then forks worker processes so that pure computations run under copy-on-write semantics.
74+
Workers call the user function with a key and an immutable array of values and must return a list of key/array pairs.
75+
The runtime deduplicates output keys and produces a new collection.
76+
\texttt{marshalled\_map} wraps the same mechanism but serializes values with \texttt{Marshal.to\_string} so that closures or unsupported data types can be returned at higher cost.
77+
78+
\subsection{Observation and Lifecycle}
79+
\texttt{Reactive.get\_array} can only be called after \texttt{Reactive.exit}.
80+
Before exiting, the runtime is still in ``graph building'' mode and will raise \texttt{Toplevel\_get\_array}.
81+
After exit, code pointers are reprotected read-only, making it safe to reuse cached values.
82+
Exiting twice is a fatal error.
83+
\texttt{Reactive.union} merges two collections into a combined one when fan-in is required.
84+
85+
\section{Programming Model Reference}
86+
The public API surfaces the following primitives (types copied directly from \texttt{reactive.mli} for clarity):
87+
\begin{itemize}
88+
\item \textbf{\codeinline{init}}\\
89+
\emph{type:} \codeinline{filename -> int -> unit}. Create or open a persistent heap with a fixed upper bound in bytes. Must be called before any other function.
90+
\item \textbf{\codeinline{input_files}}\\
91+
\emph{type:} \codeinline{filename array -> tracker t}. Declare the set of input files. Skip records and sorts them; cached runs require the same set.
92+
\item \textbf{\codeinline{read_file}}\\
93+
\emph{type:} \codeinline{filename -> tracker -> string}. Read file contents through the tracker supplied by \codeinline{input_files}, ensuring the dependency is tracked.
94+
\item \textbf{\codeinline{map}}\\
95+
\emph{type:} \codeinline{'a t -> (key -> 'a array -> (key * 'b array) array) -> 'b t}. Apply a pure transformation to every key's values, producing a new collection. Runs under forked worker processes and cannot be called recursively.
96+
\item \textbf{\codeinline{marshalled_map}}\\
97+
\emph{type:} \codeinline{'a t -> (key -> 'a array -> (key * 'b array) array) -> 'b marshalled t}. Variant of \codeinline{map} that serializes outputs so closures or custom types can be cached.
98+
\item \textbf{\codeinline{unmarshal}}\\
99+
\emph{type:} \codeinline{'a marshalled -> 'a}. Deserialize values produced by \codeinline{marshalled_map}.
100+
\item \textbf{\codeinline{get_array}}\\
101+
\emph{type:} \codeinline{'a t -> key -> 'a array}. Access cached arrays after \codeinline{exit}. Calling it earlier raises \codeinline{Toplevel_get_array}.
102+
\item \textbf{\codeinline{union}}\\
103+
\emph{type:} \codeinline{'a t -> 'a t -> 'a t}. Merge two collections, useful when fusing branches or joining independent maps.
104+
\item \textbf{\codeinline{exit}}\\
105+
\emph{type:} \codeinline{unit -> unit}. Seal the heap, flush caches, and transition to observation mode. Required before calling \codeinline{get_array}.
106+
\end{itemize}
107+
108+
\section{Migrating an OCaml Program}
109+
The easiest way to make an existing binary reactive is to follow a disciplined set of transformation steps.
110+
111+
\subsection{Audit Inputs and Effects}
112+
\begin{enumerate}
113+
\item \textbf{Identify stable inputs.} Files, command-line data, or database exports that should trigger incremental invalidation become entries in the array passed to \texttt{input\_files}.
114+
\item \textbf{Fence off side effects.} Anything that relies on wall-clock time, random numbers, network calls, or mutable globals must either be converted into deterministic data or moved outside the reactive pipeline.
115+
\item \textbf{Design trackers.} Each file read should map to one or more trackers so that the runtime knows when to re-run a node.
116+
\end{enumerate}
117+
118+
\subsection{Stage Computations into Maps}
119+
Walk the original pipeline and wrap each pure stage in its own \texttt{Reactive.map}:
120+
\begin{itemize}
121+
\item \textbf{Reader maps} use \texttt{read\_file} exactly once per tracker and emit the parsed representation.
122+
\item \textbf{Transformation maps} accept the normalized data and compute derived metrics, such as uppercasing and reversing text, or fanning out words into length buckets.
123+
\item \textbf{Aggregators} combine branches via \texttt{union} or by emitting multiple keys per input.
124+
\end{itemize}
125+
Because maps run out-of-process, they must avoid capturing mutable OCaml state except through their arguments.
126+
Closures can be emitted only through \texttt{marshalled\_map}.
127+
Raw closures passed through regular \texttt{map} will be rejected by the runtime.
128+
129+
\subsection{Exit and Observe}
130+
Once the graph is declared, call \texttt{Reactive.exit()}.
131+
Only after exiting can downstream code fetch arrays and integrate with non-reactive subsystems (database writes, HTTP responses, etc.).
132+
Attempting to call \texttt{get\_array} before exiting will raise \texttt{Toplevel\_get\_array}.
133+
Forking after exit is allowed and lets children reuse cached data without reopening the heap, as long as the parent remains alive.
134+
135+
\subsection{Worked Transformation}
136+
Listing~\ref{lst:transformation} sketches how an imperative file-processing script can be rewritten.
137+
138+
\begin{lstlisting}[caption={From imperative to reactive},label={lst:transformation}]
139+
(* Imperative baseline *)
140+
let summarize files =
141+
files
142+
|> Array.map (fun file ->
143+
let content = Stdlib.input_all (open_in file) in
144+
let metrics = analyze content in
145+
(file, metrics))
146+
147+
(* Reactive version *)
148+
let summarize_reactive files =
149+
Reactive.init "analysis.rheap" (512 * 1024 * 1024);
150+
let inputs = Reactive.input_files files in
151+
let parsed =
152+
Reactive.map inputs (fun key trackers ->
153+
let raw = Reactive.read_file key trackers.(0) in
154+
[| (key, [| parse raw |]) |])
155+
in
156+
let metrics =
157+
Reactive.map parsed (fun key arr ->
158+
let summary = analyze arr.(0) in
159+
[| (key, [| summary |]) |])
160+
in
161+
Reactive.exit ();
162+
Array.map (fun file -> (file, Reactive.get_array metrics file)) files
163+
\end{lstlisting}
164+
165+
\section{Common Patterns and Considerations}
166+
\paragraph{Multi-file fan-out} When processing multiple input files, map functions should validate that output keys can differ from input keys. For large workloads, allocate sufficiently large heaps during initialization.
167+
\paragraph{Key management} Maps should treat the key argument as the canonical identifier when emitting derived keys to maintain consistency across stages.
168+
\paragraph{Dependent stages} Building pipelines with multiple dependent stages (e.g., word extraction followed by length calculation) requires careful reuse of keys at different logical layers.
169+
\paragraph{Nested maps} Calling \texttt{map} inside another \texttt{map} is not supported. Nested reactive graphs must be flattened or expressed through separate top-level maps.
170+
\paragraph{Heap reuse limitations} On macOS, cached heaps cannot be reopened by a fresh process; cleanup scripts should delete \texttt{*.rheap} files after each run unless debugging requires preserving them.
171+
\paragraph{Tracker discipline} Every file read must pass through the tracker array supplied by \texttt{input\_files}; ad-hoc I/O violates dependency tracking and will compromise the reactive guarantees.
172+
173+
\section{Operational Playbook}
174+
\subsection{Build and Link}
175+
\begin{itemize}
176+
\item Build the reactive library to produce \texttt{reactive.cmxa} and \texttt{libskip\_reactive.a}.
177+
Linking your own program requires including \texttt{-cclib -lstdc++}.
178+
\item macOS binaries rely on custom Mach-O segments; Linux requires explicit linker flags to disable PIE and fix the text segment.
179+
\item The runtime bundles its own \texttt{main}; \texttt{runtime64\_specific.cpp} strips symbols via \texttt{objcopy} when building the static library.
180+
\end{itemize}
181+
182+
\subsection{Process Discipline}
183+
Skip relies on \texttt{fork()} to isolate worker maps and to terminate them if the parent exits.
184+
Never call \texttt{map} while already inside another \texttt{map}; the runtime tracks this via the \texttt{toplevel} flag and raises \texttt{Can\_only\_call\_map\_at\_toplevel}.
185+
Because \texttt{fork()} duplicates the process image, the code must remain single-threaded (no multicore OCaml runtime) and should avoid holding OS resources open across map boundaries unless they are read-only descriptors.
186+
187+
\subsection{Heap Hygiene}
188+
On macOS, always delete stale heaps between executions; the runtime will otherwise exit with an error requesting manual cleanup.
189+
On Linux, heaps can be reused across program restarts as long as the binary layout is unchanged.
190+
Use cleanup scripts that delete \texttt{*.rheap} files by default and offer a keep-heaps option for debugging.
191+
192+
\subsection{Testing Strategy}
193+
Reactive unit tests are regular OCaml binaries that link against the reactive library.
194+
Each test should link with \texttt{reactive.cmxa} and the static runtime library.
195+
Test harnesses should check for expected failures such as child processes intentionally aborting.
196+
Mirroring this workflow in downstream projects ensures regressions are caught early, especially around platform-specific invariants.
197+
198+
\section{Best Practices and Anti-Patterns}
199+
\begin{itemize}
200+
\item \textbf{Always initialize once.} The runtime tracks whether \texttt{init} has run and refuses to proceed otherwise.
201+
\item \textbf{Respect tracker usage.} Use the tracker array supplied to \texttt{map}; do not allocate new file handles or call \texttt{read\_file} without the matching tracker.
202+
\item \textbf{Emit immutable data.} Values returned from \texttt{map} are assumed immutable.
203+
Modifying them afterward leads to undefined behavior because multiple keys may share the same cached array.
204+
\item \textbf{Use \texttt{marshalled\_map} sparingly.} Serialization defeats structural sharing and increases heap footprint.
205+
Prefer encoding results as primitive data.
206+
\item \textbf{Expose deterministic keys.} Keys determine cache reuse.
207+
If keys depend on the execution environment (timestamps, random numbers), the runtime will never hit its cache.
208+
\item \textbf{Guard the imperative boundary.} After \texttt{exit}, copy data out before mutating it, especially when handing arrays to legacy code.
209+
\end{itemize}
210+
211+
\section{Future Directions}
212+
Bringing Skip's full reactive service model to OCaml would involve exposing replication tokens, diff streams, and authentication mechanisms described in RFC 008.
213+
The current prototype already models collections as DAG nodes; adding APIs for \texttt{diff} and \texttt{mirror} would let OCaml programs act as first-class reactive resources inside a larger Skip deployment.
214+
Another avenue is improving developer ergonomics by generating \texttt{map}-heavy boilerplate or by offering lint rules that detect nested map attempts or unchecked \texttt{get\_array} calls.
215+
216+
\section{Conclusion}
217+
Reactive OCaml offers a practical path toward incrementalizing existing workloads by reusing the Skip runtime's proven abstractions.
218+
By enforcing disciplined I/O through trackers, executing pure maps under forked workers, and persisting results into stable heaps, applications can scale to large data sets while avoiding redundant recomputation.
219+
The programming model provides a template for structuring pipelines, while Skip's broader reactive service philosophy illustrates how those pipelines integrate into end-to-end systems.
220+
With careful adherence to the guidelines in this document, developers can confidently port OCaml code to a reactive architecture that is both efficient and predictable.
221+
222+
\end{document}

reactive_ocaml.pdf

145 KB
Binary file not shown.

0 commit comments

Comments
 (0)