Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
2301d97
fix(codegen): P0 — case-arm rebinding the scrutinee's name emitted C …
claude Jun 9, 2026
31208a0
fix(codegen): P0 — compiled tail loops never yielded; spawn storms di…
claude Jun 9, 2026
273cd78
fix(runtime): P1 — allocation failure now panics loud instead of SIGS…
claude Jun 9, 2026
9c0a409
docs: Round-7 audit report (Fable, June 2026) + two new known issues …
claude Jun 9, 2026
d8bf46f
fix(lang): converge the two execution paths + make every failure loud
claude Jun 9, 2026
47e2015
test: dual-path conformance gate — every conform program must agree a…
claude Jun 9, 2026
bb52969
docs: error-model + new-surface reference; KNOWN_ISSUES — clear try/c…
claude Jun 9, 2026
48bd46f
feat(std): Std.map / Std.filter / Std.join — make the Elixir-shaped g…
claude Jun 9, 2026
fda588a
docs: README — conformance gate + current test counts + Std aliases; …
claude Jun 9, 2026
a47f14d
feat(gc): Phase 2.1 — LeakSanitizer lifecycle gate (make lsan-gate + …
claude Jun 9, 2026
2239288
feat(diag+lang): real source paths, module-qualified did-you-mean, st…
claude Jun 9, 2026
0e4337e
test+docs: Phase 2.2 scheduler-count matrix — bounded tests, S<3 skip…
claude Jun 9, 2026
76c44e0
perf(sched): O4 — bounded spin-before-park; cross-scheduler ping-pong…
claude Jun 9, 2026
df57d3c
docs: roadmap — Phase 2.1 + 2.2 DONE with findings; Round-7 table — O…
claude Jun 9, 2026
3f3b2bf
perf(boot): O5 — watchdog shutdown via condvar; hello wall time 110-1…
claude Jun 10, 2026
35283f6
test: spin-wedge reproducer + gc_slope probe timeout armor
claude Jun 10, 2026
78931d0
docs: P1 spin-gated deadlock entry; roadmap 2.2 amended; README start…
claude Jun 10, 2026
8603514
fix(runtime): three C11 data races flagged by TSan — volatile flags b…
claude Jun 10, 2026
92813f0
test: Phase 2.3 — make tsan-gate (msg ping-pong, msg+spin, 80k storm …
claude Jun 10, 2026
831d6ce
docs: roadmap — Phase 2.3 (TSan) DONE: gate + findings + the no-fiber…
claude Jun 10, 2026
437adc1
test+docs: O8 host-limit prints in the stress gate; Phase 2.4 audit r…
claude Jun 10, 2026
e8ddea1
fix(sched): P1 deadlock root-caused — Dekker StoreLoad bug in the rec…
claude Jun 10, 2026
4d2a692
docs: P1 deadlock -> Recently cleared with the full Dekker story; roa…
claude Jun 10, 2026
c635b5e
fix(parser): untrusted .sw can no longer OOM swc — receive-clause spi…
claude Jun 10, 2026
38a817a
fix(runtime): Phase 2.4 — race-free proc->state, timer peek, monitor-…
claude Jun 10, 2026
30e5497
fix(runtime): Phase 2.4 COMPLETE — supervisor/registry/monitor races …
claude Jun 10, 2026
f98288d
feat(gc): Phase 2.5 — allocation-failure injection gate (make alloc-f…
claude Jun 10, 2026
a72e32a
feat(soak): Phase 2.6 — mixed-workload soak harness (make soak); Phas…
claude Jun 10, 2026
ec1ded6
fix(json): Phase 3 — fuzz the JSON decoder; fix two heap-overflows + …
claude Jun 10, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions .github/workflows/linux-quickstart.yml
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,33 @@ jobs:
make gc-slope
SW_SCHEDULERS=1 make gc-slope

- name: LeakSanitizer lifecycle gate (Phase 2.1 — Linux-only LSan)
# The slope gates catch UNBOUNDED growth; this catches BOUNDED leaks
# under their RSS noise floor. macOS Apple-clang ASAN has no LSan, so
# every other ASAN run uses detect_leaks=0 and is leak-blind — this
# Linux leg is the only real leak assertion in CI. Churns every
# lifecycle owner (timers fired+cancelled, supervisors killed, ETS
# replace/delete, spawns, compound messages), exits cleanly, and LSan
# asserts zero definitely-lost blocks at exit. Proven bidirectional
# (an injected unreachable block fails it). Advisory while it bakes;
# promote to blocking once green across a week of pushes.
continue-on-error: true
run: make lsan-gate

- name: Scheduler-count matrix (Phase 2.2 — suite under S=1 and oversubscribed)
# The default-scheduler run above covers nproc; these legs cover the
# edges. S=1 found a real architecture issue (blocking curl client +
# in-process server deadlocks — the affected tests SKIP below 3
# schedulers, see KNOWN_ISSUES); S=8 oversubscribes the runner's
# cores so cross-scheduler interleavings get exercised. The per-test
# 180s timeout in run_tests.sh turns any future hang into a loud FAIL.
run: |
set -e
SW_SCHEDULERS=1 ./tests/sw/run_tests.sh
SW_SCHEDULERS=1 ./tests/sw/run_conform.sh
SW_SCHEDULERS=8 ./tests/sw/run_tests.sh
SW_SCHEDULERS=8 ./tests/sw/run_conform.sh

- name: Stress test (high-process-count race guard)
# Catches regressions in the high-spawn scheduler/mailbox path.
# The historical ctx-tear race is closed by per-slot generation
Expand Down
29 changes: 29 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,7 @@ examples: swc libswarmrt
# pass/fail counts across all files. See tests/sw/run_tests.sh.
test-sw: swc libswarmrt
@./tests/sw/run_tests.sh
@./tests/sw/run_conform.sh

# Security regression: the curl-backed HTTP builtins must not pass
# caller-supplied URLs / headers through a shell. Builds the injection
Expand Down Expand Up @@ -388,6 +389,34 @@ gc-slope: swc
[ $$rc -eq 0 ] && echo "gc-slope: PASS (bounded)" || echo "gc-slope: FAIL (unbounded)"; \
exit $$rc

# LeakSanitizer lifecycle gate (Phase 2.1) — Linux only (macOS Apple-clang
# ASAN ships no LSan; every other ASAN run here uses detect_leaks=0 and is
# leak-BLIND — see PRODUCTION_ROADMAP.md). Churns every lifecycle owner
# (timers fired+cancelled, supervisors killed, ETS replace/delete, spawns,
# compound messages), exits cleanly, and lets LSan assert zero
# definitely-lost blocks at exit. Parked-fiber stacks can't false-positive
# because at clean exit every process is gone; globals-reachable singletons
# are "still reachable" and not reported. Suppressions (each one a
# documented accepted-minor) live in tests/gc/lsan.supp.
.PHONY: lsan-gate
lsan-gate: swc
@if [ "$$(uname)" != "Linux" ]; then \
echo "lsan-gate: SKIP (LeakSanitizer requires Linux clang/gcc ASAN)"; exit 0; \
fi
@./bin/swc build --emit-c tests/gc/lsan_lifecycle.sw -o $(BIN_DIR)/_lsan_emit >/dev/null 2>&1 || true
$(FUZZ_CC) $(CFLAGS) -I$(SRC_DIR) -fsanitize=address -g -O1 -fno-stack-protector \
tests/gc/lsan_lifecycle.gen.c $(FUZZ_RT) -o $(BIN_DIR)/lsan_gate $(LDFLAGS)
@rm -f tests/gc/lsan_lifecycle.gen.c $(BIN_DIR)/_lsan_emit
@out=$$(SW_QUIET=1 SW_SCHEDULERS=1 \
ASAN_OPTIONS=detect_leaks=1:abort_on_error=0:exitcode=23 \
LSAN_OPTIONS=suppressions=tests/gc/lsan.supp:print_suppressions=1 \
timeout 240 $(BIN_DIR)/lsan_gate 2>&1); rc=$$?; \
echo "$$out" | tail -25; \
if ! echo "$$out" | grep -q "PROBE_OK"; then echo "lsan-gate: FAIL (probe did not complete)"; exit 1; fi; \
if [ $$rc -eq 23 ] || echo "$$out" | grep -q "definitely lost\|SUMMARY: AddressSanitizer.*leak"; then \
echo "lsan-gate: FAIL (leaks at exit)"; exit 1; fi; \
echo "lsan-gate: PASS (zero unsuppressed leaks at exit)"

# Stress: 80k-spawn microbench across default scheduler count and
# SW_SCHEDULERS=1. Defaults to 50 runs per variant and requires every
# run to print `ok 80000`. Requires native Linux x86_64 thread scheduling
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,7 +305,7 @@ The [`lib/`](lib/) directory ships modules that auto-resolve via `import` — no

| Module | What it gives you |
|---|---|
| `Std` | List / map / string helpers (range, take, drop, nth/at, zip, partition, sort, unique, find, any, all, sum, product, group_by, chunk_every, intersperse, …) — see `lib/Std.sw` for the full list. |
| `Std` | List / map / string helpers (map, filter, join, range, take, drop, nth/at, zip, partition, sort, unique, find, any, all, sum, product, group_by, chunk_every, intersperse, string_join, …) — see `lib/Std.sw` for the full list. `map`/`filter`/`reduce` also exist as global builtins; `Std.map`/`Std.filter`/`Std.join` work too. |
| `Mcp` | Model Context Protocol client + server (JSON-RPC over stdio) |
| `Embed` | Embeddings client for any OpenAI-compatible `/v1/embeddings` endpoint |
| `Vec` | ETS-backed cosine-similarity vector store (`Vec.new / add / search / size`) |
Expand Down Expand Up @@ -351,7 +351,7 @@ make test-full # the comprehensive gate: core + OTP + phases 2-10 + search
- **Compiled** — each `test_*.sw` is compiled with `swc build` and the resulting binary is run.
- **Interpreter** — `tests/sw/repl/test_*.sw` files are run via `swc test` (tree-walking interpreter). Guards against the REPL/codegen builtin drift that the May 2026 marathon closed.

Together the suite reports `all sw tests passed — 53 files, 475 assertions`.
Together the suite reports `all sw tests passed — 56 files, 493 assertions`, and `make test-sw` then runs the **dual-path conformance gate**: every program in `tests/sw/conform/` executes under BOTH `swc run` (interpreter) and `swc build` (compiled) and must produce byte-identical stdout and exit codes — the structural guard against the two paths drifting apart.

Add a `test_<topic>.sw` file in either directory and it'll be picked up automatically.

Expand Down Expand Up @@ -428,7 +428,7 @@ Stable enough to be the substrate for [swarm-code](https://github.com/skyblanket
**What CI gates on, every push:**
- README quickstart (`counter.sw`) + a few more example programs (`hello.sw`, `lambda.sw`)
- `bash scripts/check_sw_docs.sh` — **doc-compile tripwire**: every complete ```sw block in the docs and every runnable `examples/*.sw` must still compile with this `swc`
- `make test-sw` — **53 files, 475 assertions** (`.sw` language: compiled + interpreter + `swc run` paths)
- `make test-sw` — **56 files, 493 assertions** (`.sw` language: compiled + interpreter + `swc run` paths) **plus the dual-path conformance gate** (`tests/sw/conform/` — interpreter and compiled output must be byte-identical per program)
- `make test-phase$p` for `p` in **2 through 10** — C-side runtime tests: GenServer/Supervisor (phase 2), ETS (phase 3), Agent/App/DynSup (phase 4), StateMachine/ProcessGroup (phase 5), TCP (phase 6), hot reload (phase 7), GC scaffolding (phase 8), distribution (phase 9), language frontend (phase 10); the **deadlock watchdog** runs automatically in every test (active by default in the runtime)
- `make stress` — high-process-count race guard (multi-scheduler + single-scheduler spawn storm); every run must complete
- `make gc-stress` — GC v1 copy-on-escape correctness: the value-arena stress harness compiled with ASAN + `-DSW_ARENA_POISON`; a missed deep-copy on any send/spawn/ETS boundary surfaces as a use-after-free or a `0xDE`-garbage content assert
Expand Down
35 changes: 22 additions & 13 deletions docs/PRODUCTION_ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,19 +105,28 @@ Generated-code process isolation is proven and every long-lived ownership path i

Do these in roughly this order. Each is "verify locally, then branch → ff-merge to main → push".

### 2.1 Linux LeakSanitizer CI leg *(highest leverage — closes the leak-blindness above)*
- Add a job/leg to `.github/workflows/linux-quickstart.yml` that builds the `gc-stress`/`gc-slope`
probes on Linux and runs them with `ASAN_OPTIONS=detect_leaks=1` (real LSan, which macOS lacks).
This would have caught the kill-path leaks that passed a green slope. Cannot be verified on the
macOS dev box — keep the config minimal and low-risk; gate it as advisory first, promote to
blocking once green.
- Bonus: a small standalone C harness (mirror `gc_ets_alias`) that spins the runtime, runs
create→cancel/kill→reap loops for timers + supervisors, `sw_swarm_shutdown()`, and lets LSan
assert **zero leaks at exit** — a precise, non-noisy leak gate.

### 2.2 Scheduler-count matrix *(locally verifiable)*
- Run `make gc-stress`, `make test-sw`, and the phase tests under `SW_SCHEDULERS=1`, `2`, `$(nproc)`,
and oversubscribed (`2×nproc`). Fix anything that breaks; wire the matrix into CI.
### 2.1 Linux LeakSanitizer CI leg ✅ **DONE** (Round-7 continuation, 2026-06-09, on Linux)
- `make lsan-gate` (Linux-only): `tests/gc/lsan_lifecycle.sw` churns every lifecycle owner
(timers fired + cancelled incl. pre-trampoline kill, static + dynamic supervisors killed,
ETS replace/delete, spawns, compound messages; ~64 KB captures), exits cleanly, and LSan
asserts zero definitely-lost blocks at exit. Suppressions in `tests/gc/lsan.supp`, each tied
to a documented accepted-minor. **Proven bidirectional** (an injected unreachable block fails
it). Advisory CI leg added to linux-quickstart.yml — promote to blocking once green a week.
- Phase-1 validation: 40 rounds of full lifecycle churn → **zero unsuppressed leaks**.
- Canary-writing lesson: at `-O1` clang elides an unused `malloc` — escape the pointer through
a `volatile` global or your leak canary tests nothing.

### 2.2 Scheduler-count matrix ✅ **DONE** (same session) — found a real architecture issue
- Suite + conformance run under `SW_SCHEDULERS=1/2/4/8`. **Finding:** the curl-backed HTTP
client builtins block their scheduler OS THREAD, so self-loopback tests (in-process server +
blocking client) **deadlock forever under SW_SCHEDULERS=1** — invisible at default counts,
and the deadlock watchdog does not flag it (the thread is busy in libcurl, not parked).
Documented in KNOWN_ISSUES; those tests SKIP at S=1; `run_tests.sh` now bounds every test
with a 180s timeout so a hang FAILS instead of wedging the suite. Real fix is the Phase-3
item below (blocking transports → I/O thread pool with fiber park/wake, like `wsc_*`).
- Cross-scheduler wake cost (Round-7 O4): bounded spin-before-park in the scheduler idle loop
(`SW_SPIN_US`, default 30, 0 disables). Measured: cross-sched ping-pong **58.4 → 4.5 µs/rt
(13×)**; spawn/exit cycles −26%; same-sched and single-sched unchanged.

### 2.3 ThreadSanitizer build + race test
- macOS clang has TSan. Build the runtime under `-fsanitize=thread` and run a multi-scheduler
Expand Down
33 changes: 29 additions & 4 deletions docs/SW_LANGUAGE.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ fun greet(name) {
- Body is one or more expressions; the last expression's value is the return value.
- Parameters are positional; default values: `fun greet(name = "world") { ... }`.
- No explicit `return` — Erlang-style trailing-expression-is-the-value.
- Recursion is the loop construct (no `for`/`while`). Tail calls are detected and optimised by the codegen, so unbounded tail recursion doesn't blow the stack.
- Recursion is the loop construct (no `for`/`while`). **Self**-tail-calls are detected and optimised by the codegen, so unbounded `f -> f` tail recursion doesn't blow the stack. **Mutual** tail recursion (`a -> b -> a`, e.g. two state functions calling each other) is NOT yet optimised — each hop costs a C stack frame and deep chains overflow the 128KB process stack. Keep the recursive call in the same function (dispatch on an argument instead of bouncing between functions) for unbounded loops.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix loop wording to avoid contradicting supported syntax.

Line 106 says there is no for/while, but for is supported and documented in this reference. Please narrow this to “no while” (or equivalent) so users don’t get incorrect guidance.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/SW_LANGUAGE.md` at line 106, Update the sentence that currently reads
"Recursion is the loop construct (no `for`/`while`)" to accurately reflect
supported syntax by removing `for` from the negation—e.g. change it to
"Recursion is the loop construct (no `while`)"—and keep the rest of the
paragraph about self-tail-call optimisation and mutual tail-recursion unchanged;
target the sentence containing "Recursion is the loop construct (no
`for`/`while`)" so the docs no longer incorrectly state that `for` is
unsupported.


```sw
fun count_down(n) {
Expand Down Expand Up @@ -256,7 +256,7 @@ receive {
- Each arm is `pattern -> body`.
- `_` matches anything; specific tuples / atoms / values match exactly.
- Bound names (`prompt`, `reply_to`) capture parts of the message.
- `after MS { body }` fires if no message arrives within MS milliseconds.
- `after MS { body }` fires if no message arrives within MS milliseconds. The Erlang-style arrow form `after MS -> body` is accepted too (the body runs to the receive's closing brace — `after` is always the last clause).
- Selective receive: messages that don't match any arm STAY in the mailbox for the next `receive`.

Inside receive arm bodies, `;` DOES separate statements (it's a recognised statement separator within arm bodies).
Expand All @@ -267,6 +267,20 @@ Inside receive arm bodies, `;` DOES separate statements (it's a recognised state

Patterns appear in `receive` arms, `case` expressions, and some other binding contexts. Supported:

### Tuple-destructuring assignment

A statement may bind several names from a tuple at once:

```sw
{a, b} = {1, 2} # a=1, b=2
{x, _, z} = three_tuple() # `_` skips an element
{'ok', body} = http_fetch(url) # literal positions ASSERT-match (panic on
# {'error', _}), ident positions bind
{200, html} = fetch_page() # ints/strings/floats assert too
```

Statement position only (not inside a larger expression). Elements may be identifiers (bind), `_` (skip), or literal atoms/ints/floats/strings (match-assert with a `destructuring mismatch` panic — the Erlang `=` contract for the `{'ok', v}` idiom). A too-short or non-tuple right side panics through `elem`'s existing range/type check; extra trailing elements are tolerated. It desugars at parse time (temp + `elem()` binds + `expect()` asserts), so `swc run` and compiled binaries behave identically. List and nested-tuple left sides are not supported — use `case` for those.

| Pattern | Matches |
|---|---|
| `42`, `"foo"`, `'ok'`, `nil` | exact literal |
Expand Down Expand Up @@ -346,6 +360,16 @@ fun main() {

`with` desugars at parse time into nested `case`, so it has the same pattern-matching power (tuples, maps, lists, literals) and the exact same behavior in `swc run` and a compiled binary. There are no per-bind `when` guards — match a pattern instead. A single bind (`with p <- e { ... } else { o -> ... }`) is fine; it's just a one-arm chain.

The `else` block takes one or more `pattern [when guard] -> body` arms, exactly like `case` — the first non-`{'ok', _}` value falls through them in order:

```sw
} else {
{'error', k} when k == 'host' -> "no host configured"
{'error', k} -> f"missing: {k}"
_ -> "unknown failure"
}
```

---

## 8. Processes and message passing
Expand Down Expand Up @@ -510,6 +534,7 @@ bytes too. Bytes copy correctly over `send` and can be used as ETS keys.
| `byte_slice(b, start, len)` | subrange; `len` clamps to end |
| `bytes_concat(a, b)` | new bytes `a ++ b` |
| `string_to_bytes(s)` | string chars → bytes |
| `string_chars(s)` | list of single-**codepoint** strings (UTF-8 aware — `string_length` is bytes; `length(string_chars(s))` is codepoints; rejoin slices with `Std.join(cs, "")`). No grapheme clustering: combining marks stay separate codepoints |
| `bytes_to_string(b)` | bytes → string (truncates at first NUL by design) |
| `audio_ulaw_to_pcm16_b(b)` | mu-law bytes → PCM16 bytes (codec twin) |
| `audio_pcm16_to_ulaw_b(b)` | PCM16 bytes → mu-law bytes (codec twin) |
Expand Down Expand Up @@ -801,7 +826,7 @@ r = try {
}
```

`error(msg)` sets a thread-local error sentinel that `try { ... } catch e { ... }` catches. Outside a `try`, `error()` is silent (the calling code continues with `nil`), which makes try/catch the explicit "I want to handle failure" marker.
`error(msg)` aborts the rest of the `try` body and lands in the nearest enclosing `catch` — through function calls too: an `error()` raised inside a callee unwinds to the caller's `catch`, and the statements after the raise do not run. Outside any `try`, `error()` is silent (the calling code continues with `nil`), which makes try/catch the explicit "I want to handle failure" marker. Identical in `swc run` and compiled binaries (gated by `tests/sw/run_conform.sh`).

### Unrecoverable failures: `panic` + `expect`

Expand All @@ -825,7 +850,7 @@ panic: hit the bottom
[3] Trace.main at src/Trace.sw:16
```

`expect(value, msg)` is the idiomatic "unwrap" pattern — passes the value through if non-nil, otherwise panics with `msg`. Saves the explicit `if (x == nil) { panic(...) }` boilerplate.
`expect(value, msg)` is the idiomatic "unwrap" pattern — passes the value through unless it is **nil or `'false'`**, in which case it panics with `msg`. Saves the explicit `if (x == nil) { panic(...) }` boilerplate, and `expect(a == b, msg)` works as an assert (comparisons return `'true'`/`'false'`).

### Builtins that panic (instead of returning nil silently)

Expand Down
Loading
Loading