Skip to content

fix(model-b): first-launch onboarding + background-delivery gate (fixes the Connecting-to-network hang)#92

Merged
torlando-tech merged 9 commits into
mainfrom
fix/model-b-first-launch-tunnel-gate
Jun 19, 2026
Merged

fix(model-b): first-launch onboarding + background-delivery gate (fixes the Connecting-to-network hang)#92
torlando-tech merged 9 commits into
mainfrom
fix/model-b-first-launch-tunnel-gate

Conversation

@torlando-tech

Copy link
Copy Markdown
Owner

First-launch background-delivery gate — fixes the "Connecting to network…" hang

The bug

The first real Model-B TestFlight build hung on "Connecting to network…" for minutes on first open, then limped to the UI with no working node.

Under Model B the Network Extension is the node, so ProxyRnsBackend.start() round-trips to the NE over the VPN tunnel session. But on a fresh install there's no VPN config, and nothing on the init path ever installed/started one — the only install()+start() call sites were the Settings / Background-Transport toggles, which are unreachable while the app is stuck on the loading screen behind that very start(). So backend.start() spun ~30×8s (connectedSession) on a dead session, then failed → no node. Bootstrap deadlock; every clean install hit it. (It only "worked" in dev because the tunnel was already installed+approved on the device.)

The fix — explicit first-run gate

A VPN-approval prompt is unavoidable for any NetworkExtension app; this makes it a deliberate step rather than a silent hang.

  • AppServices.ensureBackgroundDeliveryTunnel() runs right before backend.start() (Model B only):
    • Returning users (approval persisted in the App-Group flag) → silent install → start → waitUntilConnected, no prompt.
    • First run (or if the silent reconnect can't connect, e.g. the user revoked the VPN in iOS Settings) → suspend init on a continuation and set needsBackgroundDeliveryApproval.
  • RootView shows the new BackgroundDeliveryGateView while suspended (instead of an indefinite spinner). Its Enable button → approveBackgroundDelivery(): install()+start() (fires the iOS VPN prompt) → waitUntilConnected → persist approval → resume init, so backend.start() connects in seconds. Denied/timeout → error + Try Again (init stays suspended; never spins).
  • TunnelManager.waitUntilConnected(timeoutMs:) added.
  • Everything NE-specific is under #if ENABLE_NETWORK_EXTENSION; the Python flavor is untouched.

Verification

On-device (fresh background_delivery_enabled flag), the app now logs:

[RNS] identityBytes=64
[TUNNEL-GATE] awaiting background-delivery approval (showing gate)

— it suspends at the gate and never calls backend.start() until the tunnel is up. The multi-minute hang is gone and the gate is shown immediately. (The VPN "Allow" tap + connect is the user's manual step, as for any NE app.)

Model-B build (ColumbaNetworkExtension scheme, Debug-Swift) is green; the Python Columba scheme is covered by CI.

🤖 Generated with Claude Code

… don't hang

Under Model B the Network Extension IS the node, so `ProxyRnsBackend.start()`
round-trips to the NE over the VPN tunnel session. On a fresh install there is no
VPN config at all, and nothing on the init path ever installed/started one — the
only install()+start() call sites were the Settings/Background-Transport toggles,
which are unreachable while the app is stuck on the loading screen behind that very
start(). Result: `backend.start()` spun ~30×8s (connectedSession) on a dead session
("Connecting to network…" for minutes), then failed → no node. Bootstrap deadlock;
every clean TestFlight install hit it. (It only "worked" in dev because the tunnel
was already installed+approved on the device.)

Fix — explicit first-run gate (the NE/VPN approval is unavoidable for any
NetworkExtension app; make it a deliberate step, not a silent hang):
- AppServices.ensureBackgroundDeliveryTunnel() runs right before backend.start()
  (Model B only). Returning users (approval persisted in the App-Group flag) get a
  SILENT install→start→waitUntilConnected; first run (or if the silent reconnect
  can't connect, e.g. VPN revoked in iOS Settings) suspends init on a continuation
  and sets needsBackgroundDeliveryApproval.
- RootView shows BackgroundDeliveryGateView while suspended (instead of an indefinite
  spinner). Its Enable button → approveBackgroundDelivery(): install()+start()
  (fires the iOS VPN prompt) → waitUntilConnected → persist approval → resume init,
  so backend.start() then connects in seconds. Denied/timeout → error + Try Again
  (init stays suspended; no spin).
- TunnelManager.waitUntilConnected(timeoutMs:) added.
- All NE-only paths are #if ENABLE_NETWORK_EXTENSION; the Python flavor is untouched.

Verified on device (fresh flag): init now logs `[TUNNEL-GATE] awaiting
background-delivery approval (showing gate)` and never calls backend.start() until
the tunnel is up — the multi-minute hang is gone and the gate is shown immediately.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes the Model-B first-launch hang ("Connecting to network…") by introducing an explicit VPN-approval gate (BackgroundDeliveryGateView) that suspends AppServices.initialize() via a CheckedContinuation until the Network Extension tunnel is confirmed connected before backend.start() is ever called.

  • AppServices.ensureBackgroundDeliveryTunnel() (called inside startPythonBackend for Model B) checks a persisted approval flag: returning users get a silent reconnect; first-run (or revoked VPN) suspends init and exposes needsBackgroundDeliveryApproval for RootView to swap in BackgroundDeliveryGateView.
  • TunnelManager.waitUntilConnected now delegates to the existing connectedSession polling loop, eliminating the near-duplicate logic flagged in a previous review.
  • BackgroundDeliveryPage is added as onboarding step 5, letting users enable the VPN tunnel in-flow so that post-onboarding init takes the silent reconnect path immediately.

Confidence Score: 5/5

Safe to merge. The gate logic is sound and the CheckedContinuation is used correctly — resume() is always preceded by a nil-check and the optional is cleared immediately after, so double-resume on the @mainactor is not possible.

The gate mechanism is architecturally clean: the continuation is stored and resumed correctly, waitUntilConnected now delegates to connectedSession rather than duplicating the polling loop, and the @Observable/@mainactor observation chain that drives the RootView swap is correct. The only findings are a stale comment and a latent compile-guard gap that the current build configuration prevents from manifesting.

No files require special attention beyond the two minor items in OnboardingView.swift.

Important Files Changed

Filename Overview
Sources/ColumbaApp/Services/AppServices.swift Core gate logic: ensureBackgroundDeliveryTunnel, approveBackgroundDelivery, and enableBackgroundDeliveryForOnboarding added. The first initialize(tcpServerAddress:) overload remains missing ensureTunnelManager() (pre-existing latent issue already flagged in a prior review).
Sources/ColumbaApp/Services/TunnelManager.swift waitUntilConnected added and now correctly delegates to connectedSession, fixing the near-duplicate polling loop flagged in a previous review.
Sources/ColumbaApp/Views/Onboarding/BackgroundDeliveryGateView.swift New gate view; uses a plain let for appServices (addressing prior @bindable comment), disable-while-working guard, and error/retry path on denial.
Sources/ColumbaApp/Views/Onboarding/OnboardingView.swift BackgroundDeliveryPage added as page 4; the onEnable closure calls enableBackgroundDeliveryForOnboarding (ENABLE_NETWORK_EXTENSION-only) without a compile-time guard, and the Skip button comment is now stale.
Sources/ColumbaApp/Views/Onboarding/BackgroundDeliveryPage.swift New onboarding step for Model B; seeds interfaces before NE start, idempotent identity preparation, error/retry UI.
Sources/ColumbaApp/App/ColumbaApp.swift RootView now swaps in BackgroundDeliveryGateView while needsBackgroundDeliveryApproval is true (under #if ENABLE_NETWORK_EXTENSION), replacing the indefinite spinner.
Sources/ColumbaApp/ViewModels/OnboardingViewModel.swift seedInterfaces() added as a public idempotent helper; pageCount bumped to 6 for the new BackgroundDeliveryPage step.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant App as ColumbaApp (RootView)
    participant AS as AppServices
    participant TM as TunnelManager
    participant Gate as BackgroundDeliveryGateView
    participant iOS as iOS VPN Subsystem

    App->>AS: initialize() [second overload]
    AS->>AS: ensureTunnelManager()
    AS->>AS: startPythonBackend()
    AS->>AS: ensureBackgroundDeliveryTunnel()

    alt "backgroundDeliveryEnabledKey == true (returning user)"
        AS->>TM: install() + start()
        AS->>TM: waitUntilConnected(20s)
        TM-->>AS: connected
        AS->>AS: return (silent reconnect)
    else first run or VPN revoked
        AS->>AS: withCheckedContinuation suspend
        AS->>AS: "needsBackgroundDeliveryApproval = true"
        App->>Gate: show BackgroundDeliveryGateView
        Gate->>AS: approveBackgroundDelivery()
        AS->>TM: install() + start()
        TM->>iOS: VPN Allow prompt
        iOS-->>TM: approved
        TM-->>AS: connected
        AS->>AS: persist backgroundDeliveryEnabledKey
        AS->>AS: "needsBackgroundDeliveryApproval = false"
        AS->>AS: continuation.resume()
        AS-->>Gate: return true
        Gate->>App: "isWorking = false, view swapped by RootView"
    end

    AS->>AS: backend.start()
    AS->>App: "isInitialized = true"
    App->>App: show MainTabView
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant App as ColumbaApp (RootView)
    participant AS as AppServices
    participant TM as TunnelManager
    participant Gate as BackgroundDeliveryGateView
    participant iOS as iOS VPN Subsystem

    App->>AS: initialize() [second overload]
    AS->>AS: ensureTunnelManager()
    AS->>AS: startPythonBackend()
    AS->>AS: ensureBackgroundDeliveryTunnel()

    alt "backgroundDeliveryEnabledKey == true (returning user)"
        AS->>TM: install() + start()
        AS->>TM: waitUntilConnected(20s)
        TM-->>AS: connected
        AS->>AS: return (silent reconnect)
    else first run or VPN revoked
        AS->>AS: withCheckedContinuation suspend
        AS->>AS: "needsBackgroundDeliveryApproval = true"
        App->>Gate: show BackgroundDeliveryGateView
        Gate->>AS: approveBackgroundDelivery()
        AS->>TM: install() + start()
        TM->>iOS: VPN Allow prompt
        iOS-->>TM: approved
        TM-->>AS: connected
        AS->>AS: persist backgroundDeliveryEnabledKey
        AS->>AS: "needsBackgroundDeliveryApproval = false"
        AS->>AS: continuation.resume()
        AS-->>Gate: return true
        Gate->>App: "isWorking = false, view swapped by RootView"
    end

    AS->>AS: backend.start()
    AS->>App: "isInitialized = true"
    App->>App: show MainTabView
Loading

Reviews (10): Last reviewed commit: "Model B: Settings network card lists act..." | Re-trigger Greptile

Comment thread Sources/ColumbaApp/Views/Onboarding/BackgroundDeliveryGateView.swift Outdated
Comment thread Sources/ColumbaApp/Services/TunnelManager.swift Outdated
Comment thread Sources/ColumbaApp/Services/AppServices.swift
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…ean up)

The 5-page onboarding (Welcome → Identity → Connectivity → Permissions → Complete)
existed but had NEVER shipped — every page is wrapped in #if COLUMBA_ONBOARDING_ENABLED
and that flag was defined in no build config, so RootView always took the "bypass"
branch. It also predated Model B. Enable it (Model-B configs only) and make it correct:

- pbxproj: define COLUMBA_ONBOARDING_ENABLED on the ColumbaApp Release-Swift +
  Debug-Swift configs only (NOT project-level — would leak to the NE/Tests targets —
  and NOT the Python flavor, which stays bypassed: onboarding ⇒ NE present).
- Relay seeding (correctness): under Model B the NE delivers over the first enabled
  tcpClient relay and ignores auto/multipeer/ble entities. createInterfaces() now
  always seeds exactly one enabled TCP relay (the pick, or the default community
  server); skipOnboarding() seeds the default relay instead of an AutoInterface-only
  (relay-less → unreachable) config.
- ConnectivityPage stripped to a single relay picker — removed the multi-interface
  multi-select and the in-app CoreBluetooth/Bonjour permission probes (those
  interfaces live in the NE's own process; prompting here was misleading and the
  entities were no-ops). Default server preselected so tap-through stays reachable.
- PermissionsPage: dropped the stale "Incoming voice calls" row (CallKit was removed
  in the Python-RNS migration).
- WelcomePage/OnboardingView: gate the restore-from-backup path behind
  COLUMBA_MIGRATION_ENABLED (MigrationViewModel + OnboardingRestoreSheet are under
  that flag) so onboarding compiles with migration off.

The NE/VPN step is the EXISTING BackgroundDeliveryGateView, shown by RootView right
after onboarding completes while init suspends in ensureBackgroundDeliveryTunnel() —
single tunnel owner, no double-gate, no in-pager NE page. Identity-before-NE holds
automatically: completeOnboarding() switches to the identity before onComplete fires,
and initialize() writes the shared keychain (shareIdentityForModelB) long before the
tunnel gate. Existing/returning users skip the pager (migrateExistingUsers back-fills
has_completed_onboarding) but still hit the gate once.

Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@torlando-tech torlando-tech changed the title fix(model-b): first-launch background-delivery gate (fixes the Connecting-to-network hang) fix(model-b): first-launch onboarding + background-delivery gate (fixes the Connecting-to-network hang) Jun 18, 2026
torlando-agent Bot and others added 2 commits June 18, 2026 20:17
Per Tyler: the NE/VPN enable should be a real onboarding step (before the Complete
page), not a separate gate shown after Finish.

New page order: Welcome → Identity → Relay → Permissions → Background Delivery → Complete.

The Background Delivery page (step 5/6) brings the node up in-flow: on Enable it
creates + activates the identity (idempotent prepareIdentity + switchToIdentity),
shares it to the NE-readable keychain, and installs/starts the VPN tunnel (the iOS
"Allow" prompt fires here), waiting for it to connect before advancing. Identity is
created HERE rather than on Complete so the NE can load it when the tunnel starts.

- AppServices: extracted ensureTunnelManager() (idempotent create+wire+load, used by
  both initialize() and the onboarding step) and added
  enableBackgroundDeliveryForOnboarding(identity:) which shares the identity, brings
  the tunnel up, and persists `background_delivery_enabled`. So the post-onboarding
  init takes the silent-reconnect path through ensureBackgroundDeliveryTunnel() —
  reusing the already-up tunnel, NO second gate.
- The standalone BackgroundDeliveryGateView remains for users who SKIP onboarding or
  migrated/returning users (they don't pass through the in-flow page, so the flag is
  unset and the gate still fires once).
- OnboardingViewModel.pageCount 5 → 6; OnboardingView gains an appServices param.

Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ions page

The BLE permission prompt was appearing AFTER onboarding: under Model B the app runs
the CoreBluetooth host (ModelBBLEService — the NE can't run CoreBluetooth), and it
inits CB on start during post-onboarding init, firing the iOS prompt un-guided.

Surface it inside the flow on the Permissions page:
- Added a Bluetooth permission card (mirrors the notification card) — explains it's
  optional (relay works without it; BLE adds nearby/offline mesh) and lets the user
  grant it there. OnboardingViewModel gains bluetoothAuthorization/bluetoothGranted +
  requestBluetoothPermission()/checkBluetoothStatus() via a CBCentralManager probe.
- Guarantee in-flow: on leaving the Permissions step, if BLE is still .notDetermined,
  trigger the prompt then — so the unconditional ModelBBLEService prompt is relocated
  into onboarding for every user, not just those who tap the card.

Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@torlando-tech

Copy link
Copy Markdown
Owner Author

@greptile review

torlando-agent Bot and others added 5 commits June 18, 2026 22:17
- BackgroundDeliveryGateView: @bindable → let appServices (no $binding is used; a
  plain let removes the implicit writable-binding surface).
- TunnelManager.waitUntilConnected: delegate to connectedSession(timeoutMs:) so the
  polling logic AND status source are shared — reads the live
  manager.connection.status instead of the cached self.status (which lags one
  main-actor hop behind the NEVPNStatusDidChange observer).
- AppServices.approveBackgroundDelivery: public → internal. It's the gate-only entry
  point (resumes the suspended-init continuation), so keeping it out of the public
  API prevents a future caller from invoking it outside the gate where no continuation
  is live. The non-gate enable path is the separate enableBackgroundDeliveryForOnboarding.

Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gression)

The in-flow Background-Delivery step starts the Network Extension during onboarding,
but the TCP relay was only seeded later in completeOnboarding (on Finish). The in-NE
node reads its relay ONCE at start (loadTCPRelayConfig) and has no observer to pick up
a later write, so it booted "no TCP relay configured — AppGroupBridge only" and the
device had no TCP path (device log: NE start at 02:07:06, relay written at 02:07:08).

Seed the chosen relay into the shared interface store in the Background-Delivery
onEnable, BEFORE enableBackgroundDeliveryForOnboarding installs/starts the tunnel, so
loadTCPRelayConfig finds it. createInterfaces() is now idempotent (skips a duplicate
relay) since completeOnboarding still calls it. Exposed via OnboardingViewModel.seedInterfaces().

(The NE still won't hot-pick-up relay edits made AFTER it's up — that's the deferred
TCP-interface observer in issue #91; this fixes the first-run path.)

Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…#91)

The NE registered a single ne-tcp-relay from loadTCPRelayConfig()'s FIRST enabled
tcpClient. With the onboarding-seeded community default (e.g. Beleth) first and dead,
the node dialed only that, reported ne-tcp-relay=down, and never tried a second,
reachable relay the user had configured (e.g. their own LAN transport node) — leaving
the device with no TCP path despite a working relay being right there.

Register one TCPInterface per enabled tcpClient (ids ne-tcp-relay-<n>); the node
delivers over whichever connects. loadTCPRelayConfig() → loadTCPRelayConfigs()
(returns all). The reconnect hook (setOnInterfaceConnected) and the relay-connected
wait now match the "ne-tcp-relay" prefix instead of the exact id.

(Live add/remove of relays while the NE is already running is still a follow-up —
the node reads the relay set once at start; restart picks up changes.)

Builds clean (ColumbaNetworkExtension scheme, Debug-Swift, device).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…atus

Two interface-management gaps under Model B (NE owns the RNS node):

1. Changing TCP interfaces required a manual VPN restart. The app's
   `applyInterfaceChanges` ran the python-shaped hot-add path, whose
   `ProxyRnsBackend.addInterface` throws `unsupportedInProxy`, so a
   relay add/edit/remove never reached the NE without bouncing the
   tunnel. The NE now observes the existing `configChanged` Darwin
   notification (posted at the single `InterfaceRepository.saveInterfaces`
   chokepoint on every add/edit/delete/toggle) and live-reconciles its
   `ne-tcp-relay-<entityId>` sockets — add new, drop removed, remove+re-add
   on an edited host/port — with no tunnel restart. `applyInterfaceChanges`
   now early-returns under Model B (the seam handles it).

2. The relay-status UI was stuck "not connected". The multi-relay change
   renamed interface ids `ne-tcp-relay` -> `ne-tcp-relay-<entityId>`, but
   `neTcpRelayOnline()` still exact-matched the old id so it never matched.
   Now prefix-matches, and a new per-relay `neTcpRelayStatuses()` maps each
   relay back to its entity id with online + lastError. The Manage
   Interfaces card shows per-relay status (connected / connecting /
   "Unreachable"), event-driven off the NE push (no per-second NE
   round-trip), and Network Status maps an offline-with-error relay to
   `.connectionFailed` rather than a bland `.disconnected`.

Also drop the bootstrap flag from the (currently unreachable) Beleth hub so
`TcpCommunityServer.defaultServer` no longer seeds a dead relay as the sole
TCP path on a skipped/empty onboarding.

NE changes: registerTCPRelay / startTCPRelayConfigObserver / reconcileTCPRelays
(mirror the RNode/propagation config-observer pattern), endpoint-tracking map
for edit-vs-unchanged diffing, observer flag reset + map clear in stop().

Verified on device (iPhone 14, Model B): all configured relays register by
entity id; one relay online and pulling announces; no `unsupportedInProxy`
spam; app-side TCP correctly skipped under Model B.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…irst

The Settings connection card showed "Connected — TCP (<first relay>)" by
picking `getEnabledInterfaces().first(tcpClient)` whenever the coarse
any-relay-online bool was true. With multiple relays that mislabels: a down
first-configured relay (e.g. a dead community hub) was named as the connected
interface while a different relay actually carried traffic.

`refreshConnectionState()` now lists only the relays whose entity id is online
per the per-relay `neTcpRelayStatuses()` snapshot, so the card names exactly
the relay(s) carrying traffic. Removed the now-unused coarse `neTcpRelayOnline()`
(the footgun that enabled the mislabel); per-relay status is the single source
of truth. Model A path unchanged.

A codebase sweep for the same bug class (coarse any-online used to label/count
a specific interface, or `.first` picked as "the connected one") found no other
instances — the Messaging status dot uses the aggregate `isConnected` but names
no specific relay, so it's correct.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@torlando-tech torlando-tech merged commit 48ccb2d into main Jun 19, 2026
3 checks passed
@torlando-tech torlando-tech deleted the fix/model-b-first-launch-tunnel-gate branch June 19, 2026 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant