feat: Doze-resistant location sharing with boot persistence#744
feat: Doze-resistant location sharing with boot persistence#744MatthieuTexier wants to merge 18 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryThis PR adds Doze-resistant location sharing via a new Confidence Score: 4/5Mergeable with low risk; previously flagged P1s (BootReceiver exported, persistSessions/release ordering, DB version mismatch) remain open and should be tracked, but no new P0/P1 issues were found in this pass. No new P0 or P1 bugs introduced. The single new finding is P2 (MicroG detection reliability on Android 11+). The unresolved items from prior review rounds keep the ceiling at 4/5 rather than 5/5. LocationSharingManager.kt (persistSessions/release ordering), AndroidManifest.xml (BootReceiver exported), DatabaseModule.kt (version number discrepancy), LocationCompat.kt (MicroG detection on API 30+) Important Files Changed
Sequence DiagramsequenceDiagram
participant App as ColumbaApplication
participant LSM as LocationSharingManager
participant TCM as TelemetryCollectorManager
participant Coord as LocationServiceCoordinator
participant LFS as LocationForegroundService
participant DS as DataStore
App->>LSM: restoreIfActive()
LSM->>DS: getLocationSharingSessions()
DS-->>LSM: sessionsJson
LSM->>Coord: acquire(REASON_SHARING)
Coord->>LFS: start(text, gen=1)
LFS-->>Coord: onStartCommand OK
App->>TCM: start()
TCM->>Coord: acquire(REASON_TELEMETRY)
Coord->>LFS: start(updated text, gen=1)
Note over Coord,LFS: notification text updates to sharing & telemetry
TCM->>Coord: release(REASON_TELEMETRY)
Coord->>LFS: start(text=sharing active, gen=1)
LSM->>Coord: release(REASON_SHARING)
Coord->>LFS: stop()
LFS-->>Coord: onDestroy clearReasonsForGeneration(1)
Prompt To Fix All With AIThis is a comment left during a code review.
Path: app/src/main/java/network/columba/app/util/LocationCompat.kt
Line: 88-104
Comment:
**MicroG detection silently fails on Android 11+ with user-space installs**
`getPackageInfo(..., GET_ACTIVITIES)` for a non-system package is subject to Android 11+ package visibility restrictions. On custom ROMs where MicroG is installed as a user-level package (not a privileged system app), this call throws `NameNotFoundException` and the `catch` block returns `false` — so `isMicroG` is `false`. Since `isGooglePlayServicesAvailable()` succeeds on MicroG, the final expression becomes `true && !false = true`, meaning the app falls back to MicroG's FusedLocationProvider and the inaccurate cell-tower fixes you're explicitly trying to avoid.
Worse, `isPlayServicesAvailable` caches the result on first call (`checked = true`), so the wrong decision persists for the entire app lifetime.
The `<queries>` manifest element for `com.google.android.gms` would guarantee package visibility on Android 11+; alternatively, checking for `org.microg.gms` or `org.microg.nlp` as a top-level package (which has no visibility ambiguity) is a more reliable signal.
How can I resolve this? If you propose a fix, please make it concise.Reviews (18): Last reviewed commit: "fix: detect MicroG and bypass FusedLocat..." | Re-trigger Greptile |
|
@greptile |
|
@greptile |
|
Failed test is a flaky one. |
sendLocationTelemetry was using DeliveryMethod.DIRECT which forces LXMRouter to establish a Link to the collector before delivering. When the Link cannot reach ACTIVE state — which is common right after a reboot while path discovery is still warming up, or when the peer's LRPROOF is dropped due to upstream peering bugs — the telemetry message sits in the outbound queue indefinitely. The receiver's map never refreshes and the user sees stuck positions. Telemetry payloads fit in a single MTU and are inherently fire-and-forget (no delivery proof needed; new positions arrive frequently). OPPORTUNISTIC delivery sends a single encrypted packet to the destination identity without requiring any prior Link handshake, which matches how Sideband and other reference LXMF clients handle short messages. Restores end-to-end telemetry flow after PR torlando-tech#744's boot persistence brings the components back up but the Link can't be established yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8d7d085 to
7cb034c
Compare
sendLocationTelemetry was using DeliveryMethod.DIRECT which forces LXMRouter to establish a Link to the collector before delivering. When the Link cannot reach ACTIVE state — which is common right after a reboot while path discovery is still warming up, or when the peer's LRPROOF is dropped due to upstream peering bugs — the telemetry message sits in the outbound queue indefinitely. The receiver's map never refreshes and the user sees stuck positions. Telemetry payloads fit in a single MTU and are inherently fire-and-forget (no delivery proof needed; new positions arrive frequently). OPPORTUNISTIC delivery sends a single encrypted packet to the destination identity without requiring any prior Link handshake, which matches how Sideband and other reference LXMF clients handle short messages. Restores end-to-end telemetry flow after PR torlando-tech#744's boot persistence brings the components back up but the Link can't be established yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
7cb034c to
5aa6728
Compare
Location sharing and telemetry collection stopped updating during Android Doze because the main process had no foreground service. Sessions were also lost on restart. Changes: - Add LocationForegroundService (foregroundServiceType="location") to keep GPS callbacks alive during Doze - Add LocationServiceCoordinator (reference-counted singleton) to manage service lifecycle across LocationSharingManager and TelemetryCollectorManager (send + request-only modes) - Switch GPS priority from BALANCED_POWER_ACCURACY to HIGH_ACCURACY - Persist sharing sessions to DataStore, restore on app startup - Add BootReceiver to start Reticulum at boot so sharing resumes without opening the UI - Dynamic notification text (sharing, telemetry, or both) - DB migration 44→45: add source column + index to received_locations - Adversarial review: START_NOT_STICKY, SecurityException handling, idempotent release, restore race guards, stop() state poisoning Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…afe intent - Add content title to LocationForegroundService notification - Restore collector address truncation in log (privacy) - Use type-safe class reference in BootReceiver instead of string Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LocationServiceCoordinator: - Add serviceFailed flag set by clearAll() to prevent TelemetryCollectorManager observers from retrying acquire() in a loop after SecurityException kills the service. LocationSharingManager: - Move _activeSessions/_isSharing mutation after acquire() so state stays clean if the coordinator rolls back. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… column LocationServiceCoordinator: - Add resetFailedState() method - acquire() checks permission when serviceFailed is true and resets automatically if re-granted (no process restart needed) ReceivedLocationEntity: - Document that source column is scaffolding for future SOS trail entries (PR torlando-tech#713) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LocationServiceCoordinator: - Re-check serviceFailed inside synchronized block to prevent one extra service-start attempt racing with clearAll() LocationSharingManager: - Use AtomicBoolean isRestoring to prevent concurrent restoreIfActive() calls from double-starting GPS subscriptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LocationForegroundService: - onDestroy() calls clearReasons() so coordinator doesn't think GPS is active after OS kills the service LocationServiceCoordinator: - Add clearReasons() (clears without poisoning serviceFailed) - Wrap notification update start() in try/catch in the else branch of acquire() to match the wasEmpty branch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A race existed where onDestroy() of a dying service instance could clear activeReasons that belonged to a freshly started instance. Now each service start increments a generation counter. The generation is passed via intent extra. onDestroy() only clears reasons if its generation matches the coordinator's current generation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation - stopSharing/checkExpiredSessions: call persistSessions() before release() so foreground service keeps process alive for DataStore write - acquire: increment generation only after successful start() to prevent a dying pre-failure instance from matching the generation - deserializeSessions: use optString for displayName to gracefully handle partial entries instead of wiping all sessions - BootReceiver: log exception class name for diagnosability Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HIGH_ACCURACY was introduced inadvertently — it keeps the GPS hardware active 100% of the time (~29 mAh/h) while BALANCED_POWER_ACCURACY uses WiFi/cell positioning which is sufficient for location sharing. SOS emergency has its own one-shot location path that is unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ession parsing - BootReceiver: set exported=true with RECEIVE_BOOT_COMPLETED permission guard — exported=false silently blocks BOOT_COMPLETED on Android 14+ - LocationServiceCoordinator: roll back reason on notification update failure to keep activeReasons consistent with actual notification state - LocationSharingManager: use mapNotNull in deserializeSessions to skip individual corrupt entries instead of losing all persisted sessions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BootReceiver was duplicated in AndroidManifest.xml (merge artifact). Also set exported=true with RECEIVE_BOOT_COMPLETED permission guard — exported=false silently blocks BOOT_COMPLETED on Android 14+, preventing SOS and Reticulum services from starting at boot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ackage declarations These files were added at the new path (network.columba.app.service) during the rebase onto the Kotlin migration, but their internal package declarations and intra-app references still used the old com.lxmf.messenger namespace, which broke compilation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adding the `source` column + index to ReceivedLocationEntity changed the Room schema identity hash. Without a version bump and a migration, Room crashes at startup with IllegalStateException when the app is installed over an older build that already created the v1 schema. Reproduces by: 1. Installing vanilla upstream (creates v1 DB without `source` column) 2. Installing this PR build over it 3. App crashes on first DB open The migration adds the column with the same default value the entity uses (`location_sharing`) and recreates the matching index, so existing rows keep working. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sendLocationTelemetry was using DeliveryMethod.DIRECT which forces LXMRouter to establish a Link to the collector before delivering. When the Link cannot reach ACTIVE state — which is common right after a reboot while path discovery is still warming up, or when the peer's LRPROOF is dropped due to upstream peering bugs — the telemetry message sits in the outbound queue indefinitely. The receiver's map never refreshes and the user sees stuck positions. Telemetry payloads fit in a single MTU and are inherently fire-and-forget (no delivery proof needed; new positions arrive frequently). OPPORTUNISTIC delivery sends a single encrypted packet to the destination identity without requiring any prior Link handshake, which matches how Sideband and other reference LXMF clients handle short messages. Restores end-to-end telemetry flow after PR torlando-tech#744's boot persistence brings the components back up but the Link can't be established yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Sideband/Python reference (`time.time()` convention) stores `received_at` in Unix epoch seconds and filters incoming requests with `received_at >= timebase`. Our Kotlin client was sending `timebase` in milliseconds (from `System.currentTimeMillis()`), so a Sideband-style collector compared seconds-vs-millis and the filter was always false → the collector returned 0 entries even though it had data for the requester. Symptom: BraX3 saw zero positions on its map even though the OnePlus collector held all group members' locations and other group members (running Python clients) received data normally. Fix verified on device: `Telemetry stream received with 3 entries` after restart. Also align our own collector storage (StoredTelemetryEntry) and request filter on seconds, so peers running Python clients can pull from us correctly too. The prior `receivedAtMillis` field is renamed to `receivedAtSeconds`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three accuracy bugs were combining to make peers see a stable but wildly wrong position (~2 km off in field testing): 1. `getCurrentLocation()` accepted any OS-cached fix up to ~10 s old, which on indoor restarts was a network-derived fix labelled with a plausible accuracy. Switched to `CurrentLocationRequest` with `maxUpdateAgeMillis=0` and `Granularity.GRANULARITY_FINE` to force a fresh GPS attempt. 2. The tracker fallback was returning the BALANCED_POWER cache unconditionally. When GPS times out indoors, that cache is often 1–5 km accurate. Now we drop it if accuracy > 200 m — better to send nothing than a position several km off. 3. `LocationSharingManager` continuous tracker was BALANCED_POWER too, so user-initiated sharing sessions delivered low-accuracy WiFi/cell fixes to recipients. Switched to HIGH_ACCURACY: sharing sessions are user-triggered and time-bounded, so the GPS draw is acceptable here (unlike the always-on telemetry collector path which keeps the BALANCED tracker for cache + on-demand HIGH_ACCURACY at send time). Also bumped the one-shot timeout from 20 s to 30 s — cold-start GPS locks routinely take 20–30 s when the device has been indoors for a while. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 200 m fallback threshold added in the previous commit was too strict — when GPS is unavailable indoors and the BALANCED tracker only has a coarse cell-only fix, the user could not get any telemetry out even by hitting "send now" manually. Two adjustments: - Bump the periodic-cycle threshold from 200 m → 500 m. 200 m rejects most indoor WiFi fixes (often 100–500 m) where the position is still useful; 500 m still rejects the 1–5 km cell-only worst case that motivated the threshold in the first place. - Add `allowAnyAccuracy` parameter to `getTelemetryLocation()` and `sendTelemetryToCollector()`. `sendTelemetryNow()` (user-initiated) passes `true` so the threshold is bypassed entirely — when the user explicitly hits "send", they want SOMETHING, even an approximate position. Periodic background cycles still pass `false` to avoid spamming peers with bad fixes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On MicroG-based ROMs, `com.google.android.gms` advertises itself as GMS-available but its FusedLocationProviderClient routes location requests through MicroG's NLP. We observed it returning stale cell-tower fixes labelled `provider=fused acc=18m` while the device was actually ~2 km away — Waze on the same device queried `LocationManager.GPS_PROVIDER` directly and had no problem. Detect MicroG by inspecting the `com.google.android.gms` package's activity classes — MicroG ships everything under `org.microg.*`, real Google GMS keeps it under `com.google.android.gms.*`. When detected, treat it as "GMS not available" so the LocationCompat fallback uses the platform LocationManager (GPS_PROVIDER first, NETWORK_PROVIDER fallback) — the same path Waze uses. Also bump the platform single-location timeout from 10 s → 30 s for the API-R-and-below path; cold-start GPS lock indoors routinely needs 20–30 s before a real fix is available. Signature-based detection was tried first but the cert hash differed between dumpsys and PackageManager (signing scheme version mismatch), making it unreliable across MicroG builds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
91998b0 to
9fa0008
Compare
Summary
Location sharing and telemetry collection stopped updating during Android Doze because the main process had no foreground service. Sessions were also lost on app/device restart.
This PR adds:
foregroundServiceType="location") to keep GPS callbacks alive during DozeBALANCED_POWER_ACCURACYtoHIGH_ACCURACY(reliable in Doze with a location foreground service)sourcecolumn + index onreceived_locationsAddresses user request for location sharing persistence across restarts (PR #713 comment by bd.) and partially addresses #385 (Live Tracking reliability).
Changes
LocationForegroundService.ktSTART_NOT_STICKY,SecurityExceptionhandlingLocationServiceCoordinator.ktBootReceiver.ktBOOT_COMPLETEDLocationSharingManager.ktHIGH_ACCURACY, acquire/release coordinationTelemetryCollectorManager.ktstop()TelemetryLocationTracker.ktHIGH_ACCURACYGPS prioritySettingsRepository.ktColumbaApplication.ktrestoreIfActive()in all 4 init pathsAndroidManifest.xmlFOREGROUND_SERVICE_LOCATION,RECEIVE_BOOT_COMPLETED, service + receiver declarationsReceivedLocationEntity.ktsourcecolumn with@ColumnInfo(defaultValue)+ indexDatabaseModule.ktColumbaDatabase.ktTest plan