InsForge · tonychang04 · May 11, 2026 · May 11, 2026 · May 11, 2026
diff --git a/skills/insforge-cli/SKILL.md b/skills/insforge-cli/SKILL.md
@@ -166,6 +166,8 @@ For frontend hosting see **Frontend Deployments** above.
 - `npx @insforge/cli compute events <id> [--limit 50]` — Fly machine **lifecycle events** (start/stop/exit/restart). Container stdout/stderr is NOT surfaced in v1 — that's roadmap work and will reuse the freed-up `compute logs` command name when it lands. To debug a crash-looping container today, reproduce locally with the same image.
 - `npx @insforge/cli compute delete <id>` — destroy the service and its Fly.io resources. **Permanent.** Audit log captures the full config (incl. encrypted env blob) on delete for reconstruction. Dashboard adds a type-to-confirm gate; the CLI does not — guard scripted deletes carefully.
 
+> 💤 **Scale-to-zero — v1 is the only mode.** Every compute service deploys with Fly's autostop fully on: `autostop: "stop"` + `autostart: true` + `min_machines_running: 0` (Machines API field names; fly.toml uses the longer `auto_stop_machines` / `auto_start_machines` for the same fields — InsForge hits the Machines API directly). When traffic is idle, Fly stops the machine; the next request wakes it (~1s cold start on `shared-1x`). **The CLI does not expose any override flags** — every service gets the same setting. If you need always-on for a latency-critical service, contact support: we can adjust the machine config directly while we gauge demand for CLI knobs. Don't ask agents to "set autostop to off" or "keep N warm" — there's no flag for it and nothing the skill can do.
+
 ### Secrets — `npx @insforge/cli secrets`
 - `npx @insforge/cli secrets list [--all]` — list secrets (values hidden; `--all` includes deleted)
 - `npx @insforge/cli secrets get <key>` — get decrypted value

diff --git a/skills/insforge-cli/references/compute-deploy.md b/skills/insforge-cli/references/compute-deploy.md
@@ -201,6 +201,38 @@ Authoritative current list and pricing: <https://fly.io/docs/about/pricing/#star
 | ML inference (CPU) | `performance-4x 8192` |
 | Heavy data processing | `performance-8x 16384` |
 
+## Scale-to-zero (v1 — the only mode)
+
+Every compute service deploys with full scale-to-zero. The machine stops when traffic is idle and wakes on the next incoming request. **There's no flag to change this** — the CLI ships one mode, by design.
+
+The per-machine `services` block sent to Fly's Machines API includes:
+
+```json
+{
+  "autostop": "stop",
+  "autostart": true,
+  "min_machines_running": 0
+}
+```
+
+> Fly has **two field-name conventions** for the same settings: fly.toml uses the long form (`auto_stop_machines`, `auto_start_machines`); the Machines API uses the short form (`autostop`, `autostart`). InsForge hits the Machines API directly, so the body uses the short names — that's what `flyctl machines list --json` will show you, and that's what you'd look for if you're debugging. Authoritative schema: [`fly.MachineService` in the OpenAPI spec](https://docs.machines.dev/spec/openapi3.json). Conceptual docs at [Fly's autostop/autostart page](https://fly.io/docs/launch/autostop-autostart/) describe the same fields under the fly.toml names.
+
+| Field (Machines API) | fly.toml alias | InsForge value (fixed) | Fly's range | What it does |
+|---|---|---|---|---|
+| `autostop` | `auto_stop_machines` | `"stop"` | `"off" \| "stop" \| "suspend"` | `stop` = fully stop on idle (cheapest, ~1s cold start). InsForge picks `stop` for v1. |
+| `autostart` | `auto_start_machines` | `true` | `bool` | Wake the machine when a request arrives at its endpoint. |
+| `min_machines_running` | `min_machines_running` | `0` | `int ≥ 0` | Minimum warm replicas. `0` = full scale-to-zero. |
+
+**Cold start:** ~1s on `shared-1x` for typical images (more for fat images, less for tiny ones). The first request after an idle period waits for the machine to boot; subsequent requests are warm until the next idle window.
+
+### Why no override flags
+
+Less surface area to maintain, fewer footguns. Every service in the system behaves the same way, which means support and the dashboard don't have to reason about "is this one always-on or scale-to-zero?" If real demand for always-on or warm replicas shows up, we'll add `--autostop` / `--min-machines` flags then; until then, support can flip a specific service for you on request. **Do not ask the agent to "set autostop off" or "keep N warm" — there's no flag for it and nothing the skill can do.** **Do not run `flyctl machine update` against the service yourself** either; the Fly machine isn't yours to manage, and operating on it directly will drift state away from the InsForge cloud's view of the service.
+
+### Already-deployed services
+
+Services deployed before this default landed are still running with the old (always-on) config. They won't pick up scale-to-zero until the next `compute deploy` / `compute update` against them — that's the call that pushes a fresh `services` block to Fly. To migrate an existing service, redeploy it (or run `compute update <id>` with no real changes; the config push happens regardless).
+
 ## What happens internally
 
 CLI → OSS instance → InsForge cloud backend → Fly.io. The cloud: