fix(model-b): first-launch onboarding + background-delivery gate (fixes the Connecting-to-network hang)#92
Conversation
… don't hang
Under Model B the Network Extension IS the node, so `ProxyRnsBackend.start()`
round-trips to the NE over the VPN tunnel session. On a fresh install there is no
VPN config at all, and nothing on the init path ever installed/started one — the
only install()+start() call sites were the Settings/Background-Transport toggles,
which are unreachable while the app is stuck on the loading screen behind that very
start(). Result: `backend.start()` spun ~30×8s (connectedSession) on a dead session
("Connecting to network…" for minutes), then failed → no node. Bootstrap deadlock;
every clean TestFlight install hit it. (It only "worked" in dev because the tunnel
was already installed+approved on the device.)
Fix — explicit first-run gate (the NE/VPN approval is unavoidable for any
NetworkExtension app; make it a deliberate step, not a silent hang):
- AppServices.ensureBackgroundDeliveryTunnel() runs right before backend.start()
(Model B only). Returning users (approval persisted in the App-Group flag) get a
SILENT install→start→waitUntilConnected; first run (or if the silent reconnect
can't connect, e.g. VPN revoked in iOS Settings) suspends init on a continuation
and sets needsBackgroundDeliveryApproval.
- RootView shows BackgroundDeliveryGateView while suspended (instead of an indefinite
spinner). Its Enable button → approveBackgroundDelivery(): install()+start()
(fires the iOS VPN prompt) → waitUntilConnected → persist approval → resume init,
so backend.start() then connects in seconds. Denied/timeout → error + Try Again
(init stays suspended; no spin).
- TunnelManager.waitUntilConnected(timeoutMs:) added.
- All NE-only paths are #if ENABLE_NETWORK_EXTENSION; the Python flavor is untouched.
Verified on device (fresh flag): init now logs `[TUNNEL-GATE] awaiting
background-delivery approval (showing gate)` and never calls backend.start() until
the tunnel is up — the multi-minute hang is gone and the gate is shown immediately.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Greptile SummaryThis PR fixes the Model-B first-launch hang ("Connecting to network…") by introducing an explicit VPN-approval gate (
Confidence Score: 5/5Safe to merge. The gate logic is sound and the CheckedContinuation is used correctly — resume() is always preceded by a nil-check and the optional is cleared immediately after, so double-resume on the @mainactor is not possible. The gate mechanism is architecturally clean: the continuation is stored and resumed correctly, waitUntilConnected now delegates to connectedSession rather than duplicating the polling loop, and the @Observable/@mainactor observation chain that drives the RootView swap is correct. The only findings are a stale comment and a latent compile-guard gap that the current build configuration prevents from manifesting. No files require special attention beyond the two minor items in OnboardingView.swift. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant App as ColumbaApp (RootView)
participant AS as AppServices
participant TM as TunnelManager
participant Gate as BackgroundDeliveryGateView
participant iOS as iOS VPN Subsystem
App->>AS: initialize() [second overload]
AS->>AS: ensureTunnelManager()
AS->>AS: startPythonBackend()
AS->>AS: ensureBackgroundDeliveryTunnel()
alt "backgroundDeliveryEnabledKey == true (returning user)"
AS->>TM: install() + start()
AS->>TM: waitUntilConnected(20s)
TM-->>AS: connected
AS->>AS: return (silent reconnect)
else first run or VPN revoked
AS->>AS: withCheckedContinuation suspend
AS->>AS: "needsBackgroundDeliveryApproval = true"
App->>Gate: show BackgroundDeliveryGateView
Gate->>AS: approveBackgroundDelivery()
AS->>TM: install() + start()
TM->>iOS: VPN Allow prompt
iOS-->>TM: approved
TM-->>AS: connected
AS->>AS: persist backgroundDeliveryEnabledKey
AS->>AS: "needsBackgroundDeliveryApproval = false"
AS->>AS: continuation.resume()
AS-->>Gate: return true
Gate->>App: "isWorking = false, view swapped by RootView"
end
AS->>AS: backend.start()
AS->>App: "isInitialized = true"
App->>App: show MainTabView
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant App as ColumbaApp (RootView)
participant AS as AppServices
participant TM as TunnelManager
participant Gate as BackgroundDeliveryGateView
participant iOS as iOS VPN Subsystem
App->>AS: initialize() [second overload]
AS->>AS: ensureTunnelManager()
AS->>AS: startPythonBackend()
AS->>AS: ensureBackgroundDeliveryTunnel()
alt "backgroundDeliveryEnabledKey == true (returning user)"
AS->>TM: install() + start()
AS->>TM: waitUntilConnected(20s)
TM-->>AS: connected
AS->>AS: return (silent reconnect)
else first run or VPN revoked
AS->>AS: withCheckedContinuation suspend
AS->>AS: "needsBackgroundDeliveryApproval = true"
App->>Gate: show BackgroundDeliveryGateView
Gate->>AS: approveBackgroundDelivery()
AS->>TM: install() + start()
TM->>iOS: VPN Allow prompt
iOS-->>TM: approved
TM-->>AS: connected
AS->>AS: persist backgroundDeliveryEnabledKey
AS->>AS: "needsBackgroundDeliveryApproval = false"
AS->>AS: continuation.resume()
AS-->>Gate: return true
Gate->>App: "isWorking = false, view swapped by RootView"
end
AS->>AS: backend.start()
AS->>App: "isInitialized = true"
App->>App: show MainTabView
Reviews (10): Last reviewed commit: "Model B: Settings network card lists act..." | Re-trigger Greptile |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…ean up) The 5-page onboarding (Welcome → Identity → Connectivity → Permissions → Complete) existed but had NEVER shipped — every page is wrapped in #if COLUMBA_ONBOARDING_ENABLED and that flag was defined in no build config, so RootView always took the "bypass" branch. It also predated Model B. Enable it (Model-B configs only) and make it correct: - pbxproj: define COLUMBA_ONBOARDING_ENABLED on the ColumbaApp Release-Swift + Debug-Swift configs only (NOT project-level — would leak to the NE/Tests targets — and NOT the Python flavor, which stays bypassed: onboarding ⇒ NE present). - Relay seeding (correctness): under Model B the NE delivers over the first enabled tcpClient relay and ignores auto/multipeer/ble entities. createInterfaces() now always seeds exactly one enabled TCP relay (the pick, or the default community server); skipOnboarding() seeds the default relay instead of an AutoInterface-only (relay-less → unreachable) config. - ConnectivityPage stripped to a single relay picker — removed the multi-interface multi-select and the in-app CoreBluetooth/Bonjour permission probes (those interfaces live in the NE's own process; prompting here was misleading and the entities were no-ops). Default server preselected so tap-through stays reachable. - PermissionsPage: dropped the stale "Incoming voice calls" row (CallKit was removed in the Python-RNS migration). - WelcomePage/OnboardingView: gate the restore-from-backup path behind COLUMBA_MIGRATION_ENABLED (MigrationViewModel + OnboardingRestoreSheet are under that flag) so onboarding compiles with migration off. The NE/VPN step is the EXISTING BackgroundDeliveryGateView, shown by RootView right after onboarding completes while init suspends in ensureBackgroundDeliveryTunnel() — single tunnel owner, no double-gate, no in-pager NE page. Identity-before-NE holds automatically: completeOnboarding() switches to the identity before onComplete fires, and initialize() writes the shared keychain (shareIdentityForModelB) long before the tunnel gate. Existing/returning users skip the pager (migrateExistingUsers back-fills has_completed_onboarding) but still hit the gate once. Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per Tyler: the NE/VPN enable should be a real onboarding step (before the Complete page), not a separate gate shown after Finish. New page order: Welcome → Identity → Relay → Permissions → Background Delivery → Complete. The Background Delivery page (step 5/6) brings the node up in-flow: on Enable it creates + activates the identity (idempotent prepareIdentity + switchToIdentity), shares it to the NE-readable keychain, and installs/starts the VPN tunnel (the iOS "Allow" prompt fires here), waiting for it to connect before advancing. Identity is created HERE rather than on Complete so the NE can load it when the tunnel starts. - AppServices: extracted ensureTunnelManager() (idempotent create+wire+load, used by both initialize() and the onboarding step) and added enableBackgroundDeliveryForOnboarding(identity:) which shares the identity, brings the tunnel up, and persists `background_delivery_enabled`. So the post-onboarding init takes the silent-reconnect path through ensureBackgroundDeliveryTunnel() — reusing the already-up tunnel, NO second gate. - The standalone BackgroundDeliveryGateView remains for users who SKIP onboarding or migrated/returning users (they don't pass through the in-flow page, so the flag is unset and the gate still fires once). - OnboardingViewModel.pageCount 5 → 6; OnboardingView gains an appServices param. Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ions page The BLE permission prompt was appearing AFTER onboarding: under Model B the app runs the CoreBluetooth host (ModelBBLEService — the NE can't run CoreBluetooth), and it inits CB on start during post-onboarding init, firing the iOS prompt un-guided. Surface it inside the flow on the Permissions page: - Added a Bluetooth permission card (mirrors the notification card) — explains it's optional (relay works without it; BLE adds nearby/offline mesh) and lets the user grant it there. OnboardingViewModel gains bluetoothAuthorization/bluetoothGranted + requestBluetoothPermission()/checkBluetoothStatus() via a CBCentralManager probe. - Guarantee in-flow: on leaving the Permissions step, if BLE is still .notDetermined, trigger the prompt then — so the unconditional ModelBBLEService prompt is relocated into onboarding for every user, not just those who tap the card. Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@greptile review |
- BackgroundDeliveryGateView: @bindable → let appServices (no $binding is used; a plain let removes the implicit writable-binding surface). - TunnelManager.waitUntilConnected: delegate to connectedSession(timeoutMs:) so the polling logic AND status source are shared — reads the live manager.connection.status instead of the cached self.status (which lags one main-actor hop behind the NEVPNStatusDidChange observer). - AppServices.approveBackgroundDelivery: public → internal. It's the gate-only entry point (resumes the suspended-init continuation), so keeping it out of the public API prevents a future caller from invoking it outside the gate where no continuation is live. The non-gate enable path is the separate enableBackgroundDeliveryForOnboarding. Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gression) The in-flow Background-Delivery step starts the Network Extension during onboarding, but the TCP relay was only seeded later in completeOnboarding (on Finish). The in-NE node reads its relay ONCE at start (loadTCPRelayConfig) and has no observer to pick up a later write, so it booted "no TCP relay configured — AppGroupBridge only" and the device had no TCP path (device log: NE start at 02:07:06, relay written at 02:07:08). Seed the chosen relay into the shared interface store in the Background-Delivery onEnable, BEFORE enableBackgroundDeliveryForOnboarding installs/starts the tunnel, so loadTCPRelayConfig finds it. createInterfaces() is now idempotent (skips a duplicate relay) since completeOnboarding still calls it. Exposed via OnboardingViewModel.seedInterfaces(). (The NE still won't hot-pick-up relay edits made AFTER it's up — that's the deferred TCP-interface observer in issue #91; this fixes the first-run path.) Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#91) The NE registered a single ne-tcp-relay from loadTCPRelayConfig()'s FIRST enabled tcpClient. With the onboarding-seeded community default (e.g. Beleth) first and dead, the node dialed only that, reported ne-tcp-relay=down, and never tried a second, reachable relay the user had configured (e.g. their own LAN transport node) — leaving the device with no TCP path despite a working relay being right there. Register one TCPInterface per enabled tcpClient (ids ne-tcp-relay-<n>); the node delivers over whichever connects. loadTCPRelayConfig() → loadTCPRelayConfigs() (returns all). The reconnect hook (setOnInterfaceConnected) and the relay-connected wait now match the "ne-tcp-relay" prefix instead of the exact id. (Live add/remove of relays while the NE is already running is still a follow-up — the node reads the relay set once at start; restart picks up changes.) Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…atus Two interface-management gaps under Model B (NE owns the RNS node): 1. Changing TCP interfaces required a manual VPN restart. The app's `applyInterfaceChanges` ran the python-shaped hot-add path, whose `ProxyRnsBackend.addInterface` throws `unsupportedInProxy`, so a relay add/edit/remove never reached the NE without bouncing the tunnel. The NE now observes the existing `configChanged` Darwin notification (posted at the single `InterfaceRepository.saveInterfaces` chokepoint on every add/edit/delete/toggle) and live-reconciles its `ne-tcp-relay-<entityId>` sockets — add new, drop removed, remove+re-add on an edited host/port — with no tunnel restart. `applyInterfaceChanges` now early-returns under Model B (the seam handles it). 2. The relay-status UI was stuck "not connected". The multi-relay change renamed interface ids `ne-tcp-relay` -> `ne-tcp-relay-<entityId>`, but `neTcpRelayOnline()` still exact-matched the old id so it never matched. Now prefix-matches, and a new per-relay `neTcpRelayStatuses()` maps each relay back to its entity id with online + lastError. The Manage Interfaces card shows per-relay status (connected / connecting / "Unreachable"), event-driven off the NE push (no per-second NE round-trip), and Network Status maps an offline-with-error relay to `.connectionFailed` rather than a bland `.disconnected`. Also drop the bootstrap flag from the (currently unreachable) Beleth hub so `TcpCommunityServer.defaultServer` no longer seeds a dead relay as the sole TCP path on a skipped/empty onboarding. NE changes: registerTCPRelay / startTCPRelayConfigObserver / reconcileTCPRelays (mirror the RNode/propagation config-observer pattern), endpoint-tracking map for edit-vs-unchanged diffing, observer flag reset + map clear in stop(). Verified on device (iPhone 14, Model B): all configured relays register by entity id; one relay online and pulling announces; no `unsupportedInProxy` spam; app-side TCP correctly skipped under Model B. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…irst The Settings connection card showed "Connected — TCP (<first relay>)" by picking `getEnabledInterfaces().first(tcpClient)` whenever the coarse any-relay-online bool was true. With multiple relays that mislabels: a down first-configured relay (e.g. a dead community hub) was named as the connected interface while a different relay actually carried traffic. `refreshConnectionState()` now lists only the relays whose entity id is online per the per-relay `neTcpRelayStatuses()` snapshot, so the card names exactly the relay(s) carrying traffic. Removed the now-unused coarse `neTcpRelayOnline()` (the footgun that enabled the mislabel); per-relay status is the single source of truth. Model A path unchanged. A codebase sweep for the same bug class (coarse any-online used to label/count a specific interface, or `.first` picked as "the connected one") found no other instances — the Messaging status dot uses the aggregate `isConnected` but names no specific relay, so it's correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
First-launch background-delivery gate — fixes the "Connecting to network…" hang
The bug
The first real Model-B TestFlight build hung on "Connecting to network…" for minutes on first open, then limped to the UI with no working node.
Under Model B the Network Extension is the node, so
ProxyRnsBackend.start()round-trips to the NE over the VPN tunnel session. But on a fresh install there's no VPN config, and nothing on the init path ever installed/started one — the onlyinstall()+start()call sites were the Settings / Background-Transport toggles, which are unreachable while the app is stuck on the loading screen behind that verystart(). Sobackend.start()spun ~30×8s (connectedSession) on a dead session, then failed → no node. Bootstrap deadlock; every clean install hit it. (It only "worked" in dev because the tunnel was already installed+approved on the device.)The fix — explicit first-run gate
A VPN-approval prompt is unavoidable for any NetworkExtension app; this makes it a deliberate step rather than a silent hang.
AppServices.ensureBackgroundDeliveryTunnel()runs right beforebackend.start()(Model B only):install → start → waitUntilConnected, no prompt.needsBackgroundDeliveryApproval.RootViewshows the newBackgroundDeliveryGateViewwhile suspended (instead of an indefinite spinner). Its Enable button →approveBackgroundDelivery():install()+start()(fires the iOS VPN prompt) →waitUntilConnected→ persist approval → resume init, sobackend.start()connects in seconds. Denied/timeout → error + Try Again (init stays suspended; never spins).TunnelManager.waitUntilConnected(timeoutMs:)added.#if ENABLE_NETWORK_EXTENSION; the Python flavor is untouched.Verification
On-device (fresh
background_delivery_enabledflag), the app now logs:— it suspends at the gate and never calls
backend.start()until the tunnel is up. The multi-minute hang is gone and the gate is shown immediately. (The VPN "Allow" tap + connect is the user's manual step, as for any NE app.)Model-B build (
ColumbaNetworkExtensionscheme,Debug-Swift) is green; the PythonColumbascheme is covered by CI.🤖 Generated with Claude Code