Skip to content

feat: Doze-resistant location sharing with boot persistence#744

Draft
MatthieuTexier wants to merge 18 commits into
torlando-tech:mainfrom
MatthieuTexier:feature/location-foreground-service-clean
Draft

feat: Doze-resistant location sharing with boot persistence#744
MatthieuTexier wants to merge 18 commits into
torlando-tech:mainfrom
MatthieuTexier:feature/location-foreground-service-clean

Conversation

@MatthieuTexier

Copy link
Copy Markdown
Contributor

Summary

Location sharing and telemetry collection stopped updating during Android Doze because the main process had no foreground service. Sessions were also lost on app/device restart.

This PR adds:

  • LocationForegroundService (foregroundServiceType="location") to keep GPS callbacks alive during Doze
  • LocationServiceCoordinator (reference-counted singleton) managing the service lifecycle across both location sharing modes (peer-to-peer and group/telemetry)
  • Session persistence to DataStore with automatic restore on app startup
  • BootReceiver to start Reticulum at device boot so sharing resumes without opening the UI
  • GPS priority upgrade from BALANCED_POWER_ACCURACY to HIGH_ACCURACY (reliable in Doze with a location foreground service)
  • Dynamic notification text reflecting active reasons (sharing, telemetry, or both)
  • DB migration 44→45: source column + index on received_locations

Addresses user request for location sharing persistence across restarts (PR #713 comment by bd.) and partially addresses #385 (Live Tracking reliability).

Changes

File Change
LocationForegroundService.kt New — lightweight foreground service with START_NOT_STICKY, SecurityException handling
LocationServiceCoordinator.kt New — thread-safe ref-counted acquire/release with rollback on failure
BootReceiver.kt New — starts Reticulum service at BOOT_COMPLETED
LocationSharingManager.kt Session persistence, restore on startup, HIGH_ACCURACY, acquire/release coordination
TelemetryCollectorManager.kt Foreground service for send + request-only modes, state poisoning in stop()
TelemetryLocationTracker.kt HIGH_ACCURACY GPS priority
SettingsRepository.kt Session persistence methods
ColumbaApplication.kt restoreIfActive() in all 4 init paths
AndroidManifest.xml FOREGROUND_SERVICE_LOCATION, RECEIVE_BOOT_COMPLETED, service + receiver declarations
ReceivedLocationEntity.kt source column with @ColumnInfo(defaultValue) + index
DatabaseModule.kt Migration 44→45
ColumbaDatabase.kt Version bump to 45

Test plan

  • Doze mode (peer-to-peer): 24+ min with positions updating every 30-60s
  • Doze mode (group client): 20+ min with positions from host
  • Doze mode (group host): 30+ min collecting and redistributing positions
  • Cold boot (peer-to-peer): sessions restored, sharing resumes without UI
  • Cold boot (group sharing): telemetry send/request resume from DataStore settings
  • Cold boot (host mode): host mode re-synced with Python after Reticulum READY
  • Service stops when all sharing/telemetry disabled
  • Notification text updates dynamically
  • Battery optimization whitelist required for full Doze resistance (existing BatteryOptimizationCard handles this)
  • Kotlin tests: 6045 passed (2 pre-existing failures on upstream/main)
  • Python tests: 1657 passed
  • ktlint + detekt: passed
  • Adversarial code review completed

@sentry

sentry Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@greptile-apps

greptile-apps Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds Doze-resistant location sharing via a new LocationForegroundService managed by a generation-counter–aware LocationServiceCoordinator, wires session persistence to DataStore with boot-time restore via BootReceiver, and fixes a Python-interop unit mismatch (receivedAtMillisreceivedAtSeconds) that was silently returning zero telemetry entries on every incremental request. The serviceFailed flag in the coordinator correctly breaks the retry loop that prior reviews flagged, and the generation counter addresses the onDestroy race.

Confidence Score: 4/5

Mergeable with low risk; previously flagged P1s (BootReceiver exported, persistSessions/release ordering, DB version mismatch) remain open and should be tracked, but no new P0/P1 issues were found in this pass.

No new P0 or P1 bugs introduced. The single new finding is P2 (MicroG detection reliability on Android 11+). The unresolved items from prior review rounds keep the ceiling at 4/5 rather than 5/5.

LocationSharingManager.kt (persistSessions/release ordering), AndroidManifest.xml (BootReceiver exported), DatabaseModule.kt (version number discrepancy), LocationCompat.kt (MicroG detection on API 30+)

Important Files Changed

Filename Overview
app/src/main/java/network/columba/app/service/LocationForegroundService.kt New foreground service with generation-counter–based onDestroy guard; previously flagged icon and title issues addressed in code with TODO comment.
app/src/main/java/network/columba/app/service/LocationServiceCoordinator.kt Thread-safe ref-counted coordinator with serviceFailed guard (breaks retry loop) and generation counter (fixes onDestroy race); logic is sound given prior review iteration.
app/src/main/java/network/columba/app/service/LocationSharingManager.kt Session persistence and restore added; persistSessions()/release() ordering race previously flagged remains unresolved — release() fires synchronously before DataStore write completes.
app/src/main/java/network/columba/app/service/TelemetryCollectorManager.kt Foreground service lifecycle integrated via updateLocationTracking(); stop() correctly poisons all flags covering needsForegroundService() before cancelling observers; timebase millis→seconds fix included.
app/src/main/java/network/columba/app/util/LocationCompat.kt MicroG detection via GET_ACTIVITIES may silently fail on Android 11+ with user-space MicroG installs; cached result means the wrong GMS decision persists for the app lifetime.
app/src/main/java/network/columba/app/service/TelemetryLocationTracker.kt One-shot HIGH_ACCURACY preferred over cache; allowAnyAccuracy param correctly threads through; CurrentLocationRequest with maxUpdateAgeMillis=0 prevents stale OS-cache fixes.
app/src/main/java/network/columba/app/receiver/BootReceiver.kt Starts Reticulum at boot; android:exported="true" vulnerability previously flagged remains in the manifest.
data/src/main/java/network/columba/app/data/di/DatabaseModule.kt MIGRATION_1_2 adds source column and index correctly; PR description says "44→45" but code is 1→2 — previously flagged mismatch remains unresolved.
reticulum/src/main/java/network/columba/app/reticulum/protocol/NativeTelemetryHandler.kt receivedAtMillis→receivedAtSeconds rename with /1000L division fixes the Python sideband interop bug; timebase filtering is now unit-consistent.
reticulum/src/main/java/network/columba/app/reticulum/protocol/NativeReticulumProtocol.kt DIRECT→OPPORTUNISTIC delivery for telemetry correctly avoids Link establishment stalls on boot before path discovery warms up.
app/src/main/AndroidManifest.xml FOREGROUND_SERVICE_LOCATION and RECEIVE_BOOT_COMPLETED permissions added; LocationForegroundService declared with foregroundServiceType="location"; BootReceiver remains exported with normal-protection permission.
data/src/main/java/network/columba/app/data/db/entity/ReceivedLocationEntity.kt source column added with matching @ColumnInfo defaultValue and SQL DEFAULT 'location_sharing'; companion constants provided for future SOS trail use.

Sequence Diagram

sequenceDiagram
    participant App as ColumbaApplication
    participant LSM as LocationSharingManager
    participant TCM as TelemetryCollectorManager
    participant Coord as LocationServiceCoordinator
    participant LFS as LocationForegroundService
    participant DS as DataStore

    App->>LSM: restoreIfActive()
    LSM->>DS: getLocationSharingSessions()
    DS-->>LSM: sessionsJson
    LSM->>Coord: acquire(REASON_SHARING)
    Coord->>LFS: start(text, gen=1)
    LFS-->>Coord: onStartCommand OK

    App->>TCM: start()
    TCM->>Coord: acquire(REASON_TELEMETRY)
    Coord->>LFS: start(updated text, gen=1)
    Note over Coord,LFS: notification text updates to sharing & telemetry

    TCM->>Coord: release(REASON_TELEMETRY)
    Coord->>LFS: start(text=sharing active, gen=1)

    LSM->>Coord: release(REASON_SHARING)
    Coord->>LFS: stop()
    LFS-->>Coord: onDestroy clearReasonsForGeneration(1)
Loading
Prompt To Fix All With AI
This is a comment left during a code review.
Path: app/src/main/java/network/columba/app/util/LocationCompat.kt
Line: 88-104

Comment:
**MicroG detection silently fails on Android 11+ with user-space installs**

`getPackageInfo(..., GET_ACTIVITIES)` for a non-system package is subject to Android 11+ package visibility restrictions. On custom ROMs where MicroG is installed as a user-level package (not a privileged system app), this call throws `NameNotFoundException` and the `catch` block returns `false` — so `isMicroG` is `false`. Since `isGooglePlayServicesAvailable()` succeeds on MicroG, the final expression becomes `true && !false = true`, meaning the app falls back to MicroG's FusedLocationProvider and the inaccurate cell-tower fixes you're explicitly trying to avoid.

Worse, `isPlayServicesAvailable` caches the result on first call (`checked = true`), so the wrong decision persists for the entire app lifetime.

The `<queries>` manifest element for `com.google.android.gms` would guarantee package visibility on Android 11+; alternatively, checking for `org.microg.gms` or `org.microg.nlp` as a top-level package (which has no visibility ambiguity) is a more reliable signal.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (18): Last reviewed commit: "fix: detect MicroG and bypass FusedLocat..." | Re-trigger Greptile

Comment thread app/src/main/java/com/lxmf/messenger/service/LocationForegroundService.kt Outdated
Comment thread app/src/main/java/com/lxmf/messenger/service/TelemetryCollectorManager.kt Outdated
Comment thread app/src/main/java/com/lxmf/messenger/service/LocationSharingManager.kt Outdated
@MatthieuTexier

Copy link
Copy Markdown
Contributor Author

@greptile

@MatthieuTexier

Copy link
Copy Markdown
Contributor Author

@greptile

@MatthieuTexier

Copy link
Copy Markdown
Contributor Author

Failed test is a flaky one.

MatthieuTexier added a commit to MatthieuTexier/columba that referenced this pull request Apr 24, 2026
sendLocationTelemetry was using DeliveryMethod.DIRECT which forces
LXMRouter to establish a Link to the collector before delivering. When
the Link cannot reach ACTIVE state — which is common right after a
reboot while path discovery is still warming up, or when the peer's
LRPROOF is dropped due to upstream peering bugs — the telemetry message
sits in the outbound queue indefinitely. The receiver's map never
refreshes and the user sees stuck positions.

Telemetry payloads fit in a single MTU and are inherently fire-and-forget
(no delivery proof needed; new positions arrive frequently). OPPORTUNISTIC
delivery sends a single encrypted packet to the destination identity
without requiring any prior Link handshake, which matches how Sideband
and other reference LXMF clients handle short messages.

Restores end-to-end telemetry flow after PR torlando-tech#744's boot persistence
brings the components back up but the Link can't be established yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatthieuTexier MatthieuTexier force-pushed the feature/location-foreground-service-clean branch from 8d7d085 to 7cb034c Compare April 24, 2026 14:42
MatthieuTexier added a commit to MatthieuTexier/columba that referenced this pull request Apr 24, 2026
sendLocationTelemetry was using DeliveryMethod.DIRECT which forces
LXMRouter to establish a Link to the collector before delivering. When
the Link cannot reach ACTIVE state — which is common right after a
reboot while path discovery is still warming up, or when the peer's
LRPROOF is dropped due to upstream peering bugs — the telemetry message
sits in the outbound queue indefinitely. The receiver's map never
refreshes and the user sees stuck positions.

Telemetry payloads fit in a single MTU and are inherently fire-and-forget
(no delivery proof needed; new positions arrive frequently). OPPORTUNISTIC
delivery sends a single encrypted packet to the destination identity
without requiring any prior Link handshake, which matches how Sideband
and other reference LXMF clients handle short messages.

Restores end-to-end telemetry flow after PR torlando-tech#744's boot persistence
brings the components back up but the Link can't be established yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@MatthieuTexier MatthieuTexier force-pushed the feature/location-foreground-service-clean branch from 7cb034c to 5aa6728 Compare April 24, 2026 14:55
@MatthieuTexier MatthieuTexier marked this pull request as draft April 25, 2026 06:05
MatthieuTexier and others added 16 commits May 10, 2026 15:52
Location sharing and telemetry collection stopped updating during
Android Doze because the main process had no foreground service.
Sessions were also lost on restart.

Changes:
- Add LocationForegroundService (foregroundServiceType="location")
  to keep GPS callbacks alive during Doze
- Add LocationServiceCoordinator (reference-counted singleton) to
  manage service lifecycle across LocationSharingManager and
  TelemetryCollectorManager (send + request-only modes)
- Switch GPS priority from BALANCED_POWER_ACCURACY to HIGH_ACCURACY
- Persist sharing sessions to DataStore, restore on app startup
- Add BootReceiver to start Reticulum at boot so sharing resumes
  without opening the UI
- Dynamic notification text (sharing, telemetry, or both)
- DB migration 44→45: add source column + index to received_locations
- Adversarial review: START_NOT_STICKY, SecurityException handling,
  idempotent release, restore race guards, stop() state poisoning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…afe intent

- Add content title to LocationForegroundService notification
- Restore collector address truncation in log (privacy)
- Use type-safe class reference in BootReceiver instead of string

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LocationServiceCoordinator:
- Add serviceFailed flag set by clearAll() to prevent
  TelemetryCollectorManager observers from retrying acquire()
  in a loop after SecurityException kills the service.

LocationSharingManager:
- Move _activeSessions/_isSharing mutation after acquire() so
  state stays clean if the coordinator rolls back.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… column

LocationServiceCoordinator:
- Add resetFailedState() method
- acquire() checks permission when serviceFailed is true and resets
  automatically if re-granted (no process restart needed)

ReceivedLocationEntity:
- Document that source column is scaffolding for future SOS trail
  entries (PR torlando-tech#713)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LocationServiceCoordinator:
- Re-check serviceFailed inside synchronized block to prevent
  one extra service-start attempt racing with clearAll()

LocationSharingManager:
- Use AtomicBoolean isRestoring to prevent concurrent
  restoreIfActive() calls from double-starting GPS subscriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LocationForegroundService:
- onDestroy() calls clearReasons() so coordinator doesn't think
  GPS is active after OS kills the service

LocationServiceCoordinator:
- Add clearReasons() (clears without poisoning serviceFailed)
- Wrap notification update start() in try/catch in the else branch
  of acquire() to match the wasEmpty branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A race existed where onDestroy() of a dying service instance could
clear activeReasons that belonged to a freshly started instance.

Now each service start increments a generation counter. The generation
is passed via intent extra. onDestroy() only clears reasons if its
generation matches the coordinator's current generation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation

- stopSharing/checkExpiredSessions: call persistSessions() before
  release() so foreground service keeps process alive for DataStore write
- acquire: increment generation only after successful start() to prevent
  a dying pre-failure instance from matching the generation
- deserializeSessions: use optString for displayName to gracefully
  handle partial entries instead of wiping all sessions
- BootReceiver: log exception class name for diagnosability

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
HIGH_ACCURACY was introduced inadvertently — it keeps the GPS hardware
active 100% of the time (~29 mAh/h) while BALANCED_POWER_ACCURACY uses
WiFi/cell positioning which is sufficient for location sharing.

SOS emergency has its own one-shot location path that is unaffected.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ession parsing

- BootReceiver: set exported=true with RECEIVE_BOOT_COMPLETED permission
  guard — exported=false silently blocks BOOT_COMPLETED on Android 14+
- LocationServiceCoordinator: roll back reason on notification update
  failure to keep activeReasons consistent with actual notification state
- LocationSharingManager: use mapNotNull in deserializeSessions to skip
  individual corrupt entries instead of losing all persisted sessions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BootReceiver was duplicated in AndroidManifest.xml (merge artifact).
Also set exported=true with RECEIVE_BOOT_COMPLETED permission guard —
exported=false silently blocks BOOT_COMPLETED on Android 14+, preventing
SOS and Reticulum services from starting at boot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ackage declarations

These files were added at the new path (network.columba.app.service) during
the rebase onto the Kotlin migration, but their internal package declarations
and intra-app references still used the old com.lxmf.messenger namespace,
which broke compilation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adding the `source` column + index to ReceivedLocationEntity changed the
Room schema identity hash. Without a version bump and a migration, Room
crashes at startup with IllegalStateException when the app is installed
over an older build that already created the v1 schema.

Reproduces by:
1. Installing vanilla upstream (creates v1 DB without `source` column)
2. Installing this PR build over it
3. App crashes on first DB open

The migration adds the column with the same default value the entity
uses (`location_sharing`) and recreates the matching index, so existing
rows keep working.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sendLocationTelemetry was using DeliveryMethod.DIRECT which forces
LXMRouter to establish a Link to the collector before delivering. When
the Link cannot reach ACTIVE state — which is common right after a
reboot while path discovery is still warming up, or when the peer's
LRPROOF is dropped due to upstream peering bugs — the telemetry message
sits in the outbound queue indefinitely. The receiver's map never
refreshes and the user sees stuck positions.

Telemetry payloads fit in a single MTU and are inherently fire-and-forget
(no delivery proof needed; new positions arrive frequently). OPPORTUNISTIC
delivery sends a single encrypted packet to the destination identity
without requiring any prior Link handshake, which matches how Sideband
and other reference LXMF clients handle short messages.

Restores end-to-end telemetry flow after PR torlando-tech#744's boot persistence
brings the components back up but the Link can't be established yet.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Sideband/Python reference (`time.time()` convention) stores
`received_at` in Unix epoch seconds and filters incoming requests with
`received_at >= timebase`. Our Kotlin client was sending `timebase` in
milliseconds (from `System.currentTimeMillis()`), so a Sideband-style
collector compared seconds-vs-millis and the filter was always false →
the collector returned 0 entries even though it had data for the
requester.

Symptom: BraX3 saw zero positions on its map even though the OnePlus
collector held all group members' locations and other group members
(running Python clients) received data normally. Fix verified on
device: `Telemetry stream received with 3 entries` after restart.

Also align our own collector storage (StoredTelemetryEntry) and request
filter on seconds, so peers running Python clients can pull from us
correctly too. The prior `receivedAtMillis` field is renamed to
`receivedAtSeconds`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three accuracy bugs were combining to make peers see a stable but
wildly wrong position (~2 km off in field testing):

1. `getCurrentLocation()` accepted any OS-cached fix up to ~10 s old,
   which on indoor restarts was a network-derived fix labelled with a
   plausible accuracy. Switched to `CurrentLocationRequest` with
   `maxUpdateAgeMillis=0` and `Granularity.GRANULARITY_FINE` to force
   a fresh GPS attempt.

2. The tracker fallback was returning the BALANCED_POWER cache
   unconditionally. When GPS times out indoors, that cache is often
   1–5 km accurate. Now we drop it if accuracy > 200 m — better to
   send nothing than a position several km off.

3. `LocationSharingManager` continuous tracker was BALANCED_POWER too,
   so user-initiated sharing sessions delivered low-accuracy WiFi/cell
   fixes to recipients. Switched to HIGH_ACCURACY: sharing sessions
   are user-triggered and time-bounded, so the GPS draw is acceptable
   here (unlike the always-on telemetry collector path which keeps the
   BALANCED tracker for cache + on-demand HIGH_ACCURACY at send time).

Also bumped the one-shot timeout from 20 s to 30 s — cold-start GPS
locks routinely take 20–30 s when the device has been indoors for a
while.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
MatthieuTexier and others added 2 commits May 10, 2026 15:52
The 200 m fallback threshold added in the previous commit was too
strict — when GPS is unavailable indoors and the BALANCED tracker only
has a coarse cell-only fix, the user could not get any telemetry out
even by hitting "send now" manually.

Two adjustments:

- Bump the periodic-cycle threshold from 200 m → 500 m. 200 m rejects
  most indoor WiFi fixes (often 100–500 m) where the position is still
  useful; 500 m still rejects the 1–5 km cell-only worst case that
  motivated the threshold in the first place.

- Add `allowAnyAccuracy` parameter to `getTelemetryLocation()` and
  `sendTelemetryToCollector()`. `sendTelemetryNow()` (user-initiated)
  passes `true` so the threshold is bypassed entirely — when the user
  explicitly hits "send", they want SOMETHING, even an approximate
  position. Periodic background cycles still pass `false` to avoid
  spamming peers with bad fixes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On MicroG-based ROMs, `com.google.android.gms` advertises itself as
GMS-available but its FusedLocationProviderClient routes location
requests through MicroG's NLP. We observed it returning stale
cell-tower fixes labelled `provider=fused acc=18m` while the device
was actually ~2 km away — Waze on the same device queried
`LocationManager.GPS_PROVIDER` directly and had no problem.

Detect MicroG by inspecting the `com.google.android.gms` package's
activity classes — MicroG ships everything under `org.microg.*`,
real Google GMS keeps it under `com.google.android.gms.*`. When
detected, treat it as "GMS not available" so the LocationCompat
fallback uses the platform LocationManager (GPS_PROVIDER first,
NETWORK_PROVIDER fallback) — the same path Waze uses.

Also bump the platform single-location timeout from 10 s → 30 s for
the API-R-and-below path; cold-start GPS lock indoors routinely needs
20–30 s before a real fix is available.

Signature-based detection was tried first but the cert hash differed
between dumpsys and PackageManager (signing scheme version mismatch),
making it unreliable across MicroG builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MatthieuTexier MatthieuTexier force-pushed the feature/location-foreground-service-clean branch from 91998b0 to 9fa0008 Compare May 10, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant