diff --git a/BLE_PROTOCOL_v0.4.0.md b/BLE_PROTOCOL_v0.4.0.md
new file mode 100644
index 0000000..5e7d027
--- /dev/null
+++ b/BLE_PROTOCOL_v0.4.0.md
@@ -0,0 +1,182 @@
+# BLE-Reticulum Protocol Specification v0.4.0
+
+**Version**: 0.4.0
+**Date**: June 2026
+**Status**: Draft
+**Backwards Compatible With**: v0.3.0, v2.2
+
+## 1. Overview
+
+This document specifies the v0.4.0 extension to the BLE-Reticulum protocol. This
+version adds a **data-path liveness probe** to detect and recover from "connected
+but data-dead" BLE links.
+
+### 1.1 Problem Statement
+
+A BLE connection can remain established at the link layer while application data
+silently stops flowing:
+
+- The Bluetooth **link layer keeps an idle connection alive** indefinitely (empty
+ PDUs at each connection event). It only drops on supervision timeout — i.e. radio
+ loss — not on application silence.
+- Under RF degradation a link can pass small writes (such as the 1-byte keepalive
+ used to defeat Android's app-inactivity timeout) while **larger data fragments
+ fail**: keepalives succeed, real data does not.
+
+In this state the link is genuinely "up", so every existing liveness mechanism
+misses it:
+
+- the reactive zombie check (`_last_real_data`) is only consulted when a *new*
+ connection arrives — it is never swept;
+- `_validate_spawned_interfaces` reconciles against the driver's connected-peer
+ set, which still lists the peer;
+- the keepalive-write-failure reaper never fires, because keepalive writes still
+ succeed.
+
+The peer therefore stays "connected" forever while no data flows, with no detection
+and no recovery — a permanent deadlock. (Empirically reproduced between two
+Linux/BlueZ nodes.)
+
+### 1.2 Solution
+
+v0.4.0 introduces an **active round-trip probe over the real data path**. Each node
+periodically sends a small `PING` that the peer echoes as a `PONG`. Because the
+probe traverses the same data path as real fragments, it fails exactly when real
+data fails. A link that round-trips the probe is proven alive; a link that stops
+round-tripping it while still connected at the link layer is data-dead and is torn
+down so it re-establishes.
+
+Crucially the probe **is** the keep-fresh traffic: a genuinely idle-but-healthy
+link is kept alive by the probe's own round-trips, so idle links are never falsely
+reaped.
+
+## 2. Frame Format
+
+v0.4.0 defines two new 2-byte control frames, sent on the same RX characteristic /
+notification path as data fragments:
+
+| Frame | Byte 0 (type) | Byte 1 | Meaning |
+|-------|---------------|---------|----------------------------------------|
+| PING | `0x04` | nonce | Liveness request |
+| PONG | `0x05` | nonce | Liveness reply (echoes the PING nonce) |
+
+The `nonce` is an opaque 1-byte value chosen by the sender; the responder copies it
+verbatim into the PONG. It exists for future round-trip correlation and is not
+currently interpreted.
+
+These type bytes do not collide with the fragment header (`0x01`=START,
+`0x02`=CONTINUE, `0x03`=END) or the 1-byte `0x00` keepalive.
+
+## 3. Probe State Machine
+
+State is tracked **per peer, keyed by stable identity** (not by BLE address, which
+rotates).
+
+### 3.1 Capability Negotiation
+
+A peer is considered **probe-capable** once a PING or PONG has been received from
+it. No handshake change is required — capability is inferred from observed probe
+traffic. Peers that never emit probe frames (pre-v0.4.0) are never marked capable.
+
+### 3.2 Liveness Tracking
+
+Receiving any inbound traffic that proves the data path — a real data fragment, a
+PING, or a PONG — updates the peer's `last_real_data` timestamp. The 1-byte
+keepalive does **not**, by design: it proves only the link, not the data path.
+
+### 3.3 Periodic Sweep
+
+Every `data_path_probe_poll_interval`, for each established peer:
+
+1. If the link has had no real data for longer than `data_path_probe_interval`,
+ send a PING. A healthy peer echoes a PONG, refreshing `last_real_data`.
+2. If the peer is probe-capable **and** `last_real_data` is older than
+ `data_path_timeout`, the data path is dead: disconnect the peer
+ (`driver.disconnect`) so the connection re-establishes and re-handshakes.
+
+A non-probe-capable peer is never reaped by this mechanism; it falls through to the
+existing reactive checks.
+
+### 3.4 PING Handling
+
+On receiving a PING, a node immediately replies with a PONG echoing the nonce, then
+treats the inbound PING itself as proof of data-path liveness.
+
+### 3.5 Asymmetric Failures
+
+Because both peers probe independently, each detects the death of its own **inbound**
+direction (it stops receiving the other's PINGs/PONGs). If only A→B fails, B sees no
+inbound from A, declares the path dead, and reconnects — re-establishing both
+directions. One side detecting is sufficient.
+
+## 4. Configuration
+
+| Key | Default | Meaning |
+|----------------------------------|---------|--------------------------------------------------|
+| `data_path_probe_interval` | 15 s | PING a link that has had no real data this long |
+| `data_path_timeout` | 45 s | Reconnect a probe-capable peer silent this long |
+| `data_path_probe_poll_interval` | 10 s | How often the sweep runs |
+
+The defaults give roughly three probe attempts before a reconnect and keep an idle
+link refreshed well inside the timeout.
+
+## 5. Backwards Compatibility
+
+### 5.1 Compatibility Matrix
+
+| Peers | Behavior |
+|--------------------|--------------------------------------------------------------------------------|
+| v0.4.0 ↔ v0.4.0 | Full probe + data-dead recovery, both directions. |
+| v0.4.0 ↔ older | v0.4.0 still PINGs; the 2-byte frame is shorter than the 5-byte fragment header, so the older peer's reassembler rejects it as "too short" and ignores it. The older peer never replies, never becomes probe-capable, and is never reaped by the probe — it retains pre-v0.4.0 behavior. |
+
+The probe is therefore safe to deploy incrementally.
+
+### 5.2 Address Normalization
+
+On a dual-role (connection-collision) link a peer may deliver a frame under its
+`dev:`-prefixed peripheral address while its identity was learned under the plain
+MAC via the central-path handshake. Implementations **MUST** normalize (strip the
+`dev:` prefix, and try both forms) when resolving a probe frame's identity, or the
+frame will fail to attribute and capability will never be established.
+
+## 6. GATT Service (Unchanged from v2.2)
+
+The probe reuses the existing RX characteristic (central → peripheral write) and the
+notification path (peripheral → central). No new characteristics are added.
+
+## 7. Implementation Notes
+
+### 7.1 Python (BlueZ/Bleak) — reference
+
+`BLEInterface.py`: `_send_probe`, `_handle_probe_frame`, and `_run_data_path_probes`
+on a `threading.Timer`. Probe frames are intercepted immediately after the keepalive
+filter in both the central (`_handle_ble_data`) and peripheral receive paths, before
+reassembly.
+
+### 7.2 Android (Kotlin driver)
+
+Android Columba bundles this `BLEInterface.py` via Chaquopy, so it inherits the probe
+unchanged. The Kotlin driver must deliver 2-byte writes/notifications unfragmented
+(it already does for the 1-byte keepalive).
+
+### 7.3 swift (CoreBluetooth) — TODO
+
+reticulum-swift's `BLEInterface` must mirror the probe, plus handle the
+CoreBluetooth-specific case of a probe-driven disconnect of a **peripheral-role**
+peer: CoreBluetooth cannot force-disconnect a subscribed central, so the app layer
+must drop the central and let it reconnect.
+
+## 8. Version History
+
+| Version | Date | Change |
+|---------|----------|--------------------------------------------------------------------|
+| v2.2 | Nov 2025 | Base protocol (MAC sorting, identity handshake, fragmentation, keepalive) |
+| v0.3.0 | Dec 2025 | Capability advertisement (peripheral-only devices) |
+| v0.4.0 | Jun 2026 | Data-path liveness probe (this document) |
+
+## 9. References
+
+- `BLE_PROTOCOL_v2.2.md` — base protocol
+- `BLE_PROTOCOL_v0.3.0.md` — capability advertisement
+- `docs/ble-architecture.md` — architecture explainer
+- `CHANGELOG.md` — 0.3.0 release entry
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 881a6be..dba6d54 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
+## [0.3.0] - 2026-06-10
+
+### Added
+- **Data-path liveness probe (protocol v0.4.0)** — detects and recovers from "connected
+ but data-dead" BLE links. A link can stay up at the link layer (which keeps idle
+ connections alive) and keep passing 1-byte keepalives while larger real data silently
+ fails; the existing reactive zombie check, `_validate_spawned_interfaces`, and the
+ keepalive-write-fail reaper all miss this because the link is genuinely up. The probe
+ sends a 2-byte `PING`(0x04)/`PONG`(0x05) round-trip over the real data path: a healthy
+ idle link is kept fresh by the probe itself (no churn), while a probe-capable peer
+ whose data path goes silent past `data_path_timeout` is torn down so it reconnects.
+ Capability is auto-negotiated (a peer becomes probe-capable on its first PING/PONG);
+ the 2-byte frames are shorter than the fragment header so older peers reject them
+ harmlessly. New config keys: `data_path_probe_interval` (default 15s),
+ `data_path_timeout` (default 45s), `data_path_probe_poll_interval` (default 10s).
+ Validated end-to-end on two Linux/BlueZ nodes.
+
## [0.2.2] - 2025-11-15
### Added
diff --git a/docs/ble-architecture.md b/docs/ble-architecture.md
new file mode 100644
index 0000000..4d828ee
--- /dev/null
+++ b/docs/ble-architecture.md
@@ -0,0 +1,803 @@
+# BLE-Reticulum Architecture
+
+This document describes the Bluetooth Low Energy (BLE) architecture of `ble-reticulum` — the
+`RNS.Interface` that carries Reticulum traffic over BLE. The protocol logic and the
+`BLEInterface` / `BLEPeerInterface` Python layer are **platform-agnostic**; the native
+**driver** beneath them is pluggable (the `BLEDriverInterface` contract, see
+`REFACTORING_GUIDE.md`).
+
+This document uses the **Android driver** (Columba's Chaquopy → Kotlin bridge) as the
+reference for the native layer, because it is the most fully featured. The Linux reference
+driver (`linux_bluetooth_driver.py`, BlueZ/Bleak) and an iOS/swift driver implement the same
+contract. Where you see Android class names below (`KotlinBLEBridge`, `BleGattClient`,
+`BleScanner`, …), read them as "the native driver's component".
+
+> The normative wire protocol lives in `BLE_PROTOCOL_v2.2.md` (base),
+> `BLE_PROTOCOL_v0.3.0.md` (capability advertisement) and `BLE_PROTOCOL_v0.4.0.md`
+> (data-path liveness probe). This file is the architectural companion to those specs.
+
+## Architecture Overview
+
+The BLE implementation follows a layered architecture with clear separation of concerns:
+
+```mermaid
+flowchart TB
+ subgraph Python["Python Layer (ble-reticulum)"]
+ BLEInterface["BLEInterface
Protocol handler, fragmentation,
peer lifecycle"]
+ BLEPeerInterface["BLEPeerInterface
Per-peer Reticulum interface"]
+ AndroidDriver["AndroidBLEDriver
Chaquopy bridge to Kotlin"]
+ end
+
+ subgraph Kotlin["Kotlin Native Layer"]
+ Bridge["KotlinBLEBridge
Main entry point,
PeerInfo tracking,
deduplication"]
+ Scanner["BleScanner
Adaptive intervals,
service filtering"]
+ Advertiser["BleAdvertiser
Identity naming,
proactive refresh"]
+ GattClient["BleGattClient
Central mode,
4-step handshake"]
+ GattServer["BleGattServer
Peripheral mode,
GATT service"]
+ OpQueue["BleOperationQueue
Serialized GATT ops"]
+ end
+
+ subgraph Android["Android BLE Stack"]
+ BluetoothAdapter["BluetoothAdapter"]
+ BluetoothLeScanner["BluetoothLeScanner"]
+ BluetoothLeAdvertiser["BluetoothLeAdvertiser"]
+ BluetoothGatt["BluetoothGatt"]
+ BluetoothGattServer["BluetoothGattServer"]
+ end
+
+ BLEInterface --> BLEPeerInterface
+ BLEInterface --> AndroidDriver
+ AndroidDriver -->|Chaquopy| Bridge
+ Bridge --> Scanner
+ Bridge --> Advertiser
+ Bridge --> GattClient
+ Bridge --> GattServer
+ GattClient --> OpQueue
+ Scanner --> BluetoothLeScanner
+ Advertiser --> BluetoothLeAdvertiser
+ GattClient --> BluetoothGatt
+ GattServer --> BluetoothGattServer
+```
+
+### Layer Responsibilities
+
+| Layer | Component | Responsibility |
+|-------|-----------|----------------|
+| Python | `BLEInterface` | Reticulum interface, packet fragmentation/reassembly, peer lifecycle |
+| Python | `BLEPeerInterface` | Per-peer Reticulum routing interface |
+| Python | `AndroidBLEDriver` | Bridge to Kotlin, callback routing |
+| Kotlin | `KotlinBLEBridge` | Entry point for Python, connection tracking, deduplication |
+| Kotlin | `BleScanner` | Device discovery with adaptive intervals |
+| Kotlin | `BleAdvertiser` | Peripheral advertising with identity |
+| Kotlin | `BleGattClient` | Central mode GATT operations |
+| Kotlin | `BleGattServer` | Peripheral mode GATT service |
+| Kotlin | `BleOperationQueue` | Serialized GATT operations (Android limitation) |
+
+---
+
+## GATT Service Structure
+
+The Reticulum BLE service follows Protocol v2.2 specification:
+
+```mermaid
+classDiagram
+ class ReticulumService {
+ UUID: 37145b00-442d-4a94-917f-8f42c5da28e3
+ Type: PRIMARY
+ }
+
+ class RXCharacteristic {
+ UUID: 37145b00-442d-4a94-917f-8f42c5da28e5
+ Properties: WRITE, WRITE_NO_RESPONSE
+ Permissions: WRITE
+ Purpose: Centrals write data here
+ }
+
+ class TXCharacteristic {
+ UUID: 37145b00-442d-4a94-917f-8f42c5da28e4
+ Properties: READ, NOTIFY, INDICATE
+ Permissions: READ
+ Purpose: Peripherals notify data here
+ }
+
+ class IdentityCharacteristic {
+ UUID: 37145b00-442d-4a94-917f-8f42c5da28e6
+ Properties: READ
+ Permissions: READ
+ Purpose: 16-byte transport identity
+ }
+
+ class CCCDDescriptor {
+ UUID: 00002902-0000-1000-8000-00805f9b34fb
+ Purpose: Enable/disable notifications
+ }
+
+ ReticulumService --> RXCharacteristic
+ ReticulumService --> TXCharacteristic
+ ReticulumService --> IdentityCharacteristic
+ TXCharacteristic --> CCCDDescriptor
+```
+
+### Characteristic Details
+
+| Characteristic | UUID Suffix | Direction | Purpose |
+|----------------|-------------|-----------|---------|
+| RX | `...28e5` | Central → Peripheral | Data and identity handshake writes |
+| TX | `...28e4` | Peripheral → Central | Notifications for outbound data |
+| Identity | `...28e6` | Read-only | Provides 16-byte transport identity hash |
+
+---
+
+## Connection Flows
+
+### Central Mode Connection Sequence
+
+When this device discovers and connects to a peripheral:
+
+```mermaid
+sequenceDiagram
+ participant Scan as BleScanner
+ participant Bridge as KotlinBLEBridge
+ participant Client as BleGattClient
+ participant Peer as Remote Peripheral
+ participant Python as AndroidBLEDriver
+
+ Scan->>Bridge: onDeviceDiscovered(address, rssi)
+ Bridge->>Bridge: shouldConnect(address)?
+ Note over Bridge: MAC comparison:
our MAC < peer MAC = connect
+ Bridge->>Client: connect(address)
+
+ rect rgb(230, 245, 255)
+ Note over Client,Peer: 4-Step GATT Handshake
+ Client->>Peer: 1. connectGatt()
+ Peer-->>Client: onConnectionStateChange(CONNECTED)
+ Client->>Peer: 2. discoverServices()
+ Peer-->>Client: onServicesDiscovered()
+
+ Client->>Peer: Read Identity Characteristic
+ Peer-->>Client: 16-byte identity hash
+ Client->>Bridge: onIdentityReceived(address, hash)
+
+ Client->>Peer: 3. requestMtu(517)
+ Peer-->>Client: onMtuChanged(negotiated_mtu)
+
+ Client->>Peer: 4. Enable CCCD notifications
+ Peer-->>Client: onDescriptorWrite(success)
+
+ Client->>Peer: Write our identity to RX
+ Peer-->>Client: onCharacteristicWrite(success)
+ end
+
+ Client->>Bridge: onConnected(address, mtu, identity)
+ Bridge->>Python: onConnected callback
+ Python->>Python: Spawn BLEPeerInterface
+```
+
+### Peripheral Mode Connection Sequence
+
+When a remote central connects to us:
+
+```mermaid
+sequenceDiagram
+ participant Central as Remote Central
+ participant Server as BleGattServer
+ participant Bridge as KotlinBLEBridge
+ participant Python as AndroidBLEDriver
+
+ Central->>Server: connectGatt()
+ Server->>Server: onConnectionStateChange(CONNECTED)
+ Server->>Bridge: onCentralConnected(address, MIN_MTU)
+ Note over Bridge: Track as pending connection
(identity not yet received)
+
+ Central->>Server: discoverServices()
+ Central->>Server: Read Identity Characteristic
+ Server-->>Central: Our 16-byte identity
+
+ Central->>Server: requestMtu()
+ Server->>Server: onMtuChanged()
+ Server->>Bridge: onMtuChanged(address, mtu)
+
+ Central->>Server: Enable CCCD notifications
+
+ rect rgb(255, 245, 230)
+ Note over Central,Server: Identity Handshake
+ Central->>Server: Write 16 bytes to RX
+ Server->>Server: Detect: len=16, no existing identity
+ Server->>Bridge: onIdentityReceived(address, hash)
+ Server->>Bridge: onDataReceived(address, identity_bytes)
+ end
+
+ Bridge->>Bridge: Complete connection with identity
+ Bridge->>Python: onConnected(address, mtu, "peripheral", identity)
+ Bridge->>Python: onIdentityReceived(address, hash)
+ Python->>Python: Spawn BLEPeerInterface
+```
+
+### Defensive Recovery for Missed onConnectionStateChange
+
+Android's `onConnectionStateChange` callback is unreliable and sometimes doesn't fire, even when a BLE connection is established. When this happens, the connection would be "orphaned" - data arrives but can't be sent back because the address isn't registered.
+
+The fix: When `handleCharacteristicWriteRequest` receives data from an address not in `connectedCentrals`, it retroactively registers the connection:
+
+```mermaid
+sequenceDiagram
+ participant Central as Remote Central
+ participant Server as BleGattServer
+ participant Bridge as KotlinBLEBridge
+ participant Python as AndroidBLEDriver
+
+ Central->>Server: connectGatt()
+ Note over Server: ⚠️ onConnectionStateChange NOT called
(Android BLE bug)
+ Note over Server: connectedCentrals is empty
+
+ Central->>Server: Write data to RX characteristic
+ Server->>Server: handleCharacteristicWriteRequest
+ Server->>Server: Check: address in connectedCentrals?
+
+ rect rgb(255, 230, 230)
+ Note over Server: DEFENSIVE RECOVERY
+ Server->>Server: Address NOT found!
Log warning
+ Server->>Server: Add to connectedCentrals
+ Server->>Server: Set MTU = MIN_MTU
+ Server->>Bridge: onCentralConnected(address, mtu)
+ Bridge->>Bridge: Add to connectedPeers
+ end
+
+ Server->>Bridge: onDataReceived(address, data)
+ Note over Server,Python: Connection now properly tracked
+```
+
+**Key log message**: `"DEFENSIVE RECOVERY: Data received from {address} but onConnectionStateChange was never called!"`
+
+---
+
+## Identity Protocol (v2.2)
+
+### Purpose
+
+Android randomizes MAC addresses for privacy. The identity protocol provides stable peer identification across MAC rotations.
+
+### Handshake Sequence (Central → Peripheral)
+
+```mermaid
+sequenceDiagram
+ participant C as Central
+ participant P as Peripheral
+
+ Note over C: Connect as GATT client
+ C->>P: Read Identity Characteristic
+ P-->>C: Peripheral's 16-byte identity
+ Note over C: Store: address → identity
+
+ C->>P: Write 16 bytes to RX Characteristic
+ Note over P: Detect identity handshake:
exactly 16 bytes, no existing identity
+ Note over P: Store: address → identity
+
+ Note over C,P: Both sides now have
identity ↔ address mapping
+```
+
+### Identity Tracking Data Structures
+
+```mermaid
+flowchart LR
+ subgraph Python["Python (BLEInterface)"]
+ P_A2I["address_to_identity
MAC → 16-byte identity"]
+ P_I2A["identity_to_address
hash → MAC"]
+ P_SI["spawned_interfaces
hash → BLEPeerInterface"]
+ P_Cache["_identity_cache
MAC → (identity, timestamp)
TTL: 60s"]
+ end
+
+ subgraph Kotlin["Kotlin (KotlinBLEBridge)"]
+ K_A2I["addressToIdentity
MAC → 32-char hex"]
+ K_I2A["identityToAddress
hex → MAC"]
+ K_Peers["connectedPeers
MAC → PeerConnection"]
+ K_Pending["pendingConnections
MAC → PendingConnection"]
+ end
+
+ P_A2I -.->|sync| K_A2I
+ P_I2A -.->|sync| K_I2A
+```
+
+### MAC Rotation Handling
+
+When a peer reconnects with a new MAC address, the handling differs by connection mode:
+
+#### Overview
+
+```mermaid
+flowchart TD
+ A[New connection from MAC_NEW] --> B{Identity received?}
+ B -->|Yes| C[Compute identity_hash]
+ C --> D{identity_hash in identityToAddress?}
+ D -->|Yes, points to MAC_OLD| E[MAC Rotation Detected]
+ E --> F{Is MAC_OLD still connected?}
+ F -->|No| G[Clean up stale mappings]
+ G --> H[Update: identity → MAC_NEW]
+ F -->|Yes| I[Dual connection - deduplicate]
+ D -->|No| J[New identity - normal flow]
+ B -->|No, peripheral| K[Wait for handshake]
+```
+
+#### Central Mode Flow (We Connect to Them)
+
+Identity is received via GATT read of Identity Characteristic, then processed in Kotlin's `handleIdentityReceived`:
+
+```mermaid
+flowchart TD
+ A[We connect to MAC_NEW
Read Identity Characteristic] --> B[Kotlin: handleIdentityReceived
Gets 16-byte identity from GATT read]
+ B --> C{Kotlin: onDuplicateIdentityDetected?
Calls Python callback if set}
+ C -->|Callback returns True
identity already at different MAC| D[Reject: disconnect MAC_NEW
Log: Duplicate identity rejected]
+ C -->|Callback returns False
new identity or same MAC| E[Allow connection to proceed]
+ C -->|No callback set| E
+
+ E --> F[Kotlin: Store addressToIdentity‹MAC_NEW›]
+ F --> G{Kotlin: identityToAddress‹hash› exists?}
+ G -->|"No (new identity)"| H[Store identityToAddress‹hash› = MAC_NEW
Notify Python: onConnected]
+ G -->|"Yes, = MAC_OLD"| I[Keep MAC_OLD as primary in identityToAddress
Still store addressToIdentity‹MAC_NEW›
Notify Python: onConnected]
+```
+
+**Key code reference**: `KotlinBLEBridge.handleIdentityReceived()` (duplicate identity check requires `onDuplicateIdentityDetected` callback)
+
+#### Peripheral Mode Flow (They Connect to Us)
+
+Identity is received via 16-byte write to RX characteristic, detected in Python's `_handle_identity_handshake`:
+
+```mermaid
+flowchart TD
+ A[MAC_NEW connects to us
Writes 16-byte identity to RX] --> B{Python: _handle_identity_handshake
Entry check: len=16 AND
no address_to_identity‹MAC_NEW›}
+ B -->|Check fails| Z[Not a handshake, pass to data handler]
+ B -->|Check passes| C{Python: _check_duplicate_identity
Returns: True if duplicate, False otherwise}
+
+ C -->|"Returns True
(identity_to_address‹hash› = MAC_OLD
AND MAC_OLD ≠ MAC_NEW)"| D[Reject: driver.disconnect‹MAC_NEW›
Log: duplicate identity rejected
Return True: handshake consumed]
+ C -->|"Returns False
(new identity OR same MAC)"| E[Allow: continue processing]
+
+ E --> F[Store address_to_identity‹MAC_NEW› = identity
Store identity_to_address‹hash› = MAC_NEW]
+ F --> G{spawned_interfaces‹hash› exists?}
+ G -->|No| H[Create new BLEPeerInterface
Store in spawned_interfaces‹hash›]
+ G -->|Yes| I{existing.peer_address ≠ MAC_NEW?}
+ I -->|Yes| J[Update existing interface:
peer_address = MAC_NEW
address_to_interface‹MAC_NEW› = interface]
+ I -->|No| K[No update needed, same address]
+```
+
+**Key code reference**: `BLEInterface._handle_identity_handshake()` at lines 1108-1200
+
+#### Return Value Clarification
+
+The `_check_duplicate_identity` function returns a **boolean**, not a MAC address:
+
+| Condition | Return Value | Meaning |
+|-----------|--------------|---------|
+| `identity_to_address[hash]` not found | `False` | New identity, allow |
+| `identity_to_address[hash]` = MAC_NEW | `False` | Same MAC, allow |
+| `identity_to_address[hash]` = MAC_OLD (≠ MAC_NEW) | `True` | Duplicate, reject |
+
+---
+
+## Deduplication State Machine
+
+When the same identity is connected via both central and peripheral paths:
+
+```mermaid
+stateDiagram-v2
+ [*] --> NONE: Initial state
+
+ NONE --> DualDetected: Same identity on both paths
+
+ DualDetected --> DecisionPoint: Determine which to keep
+
+ DecisionPoint --> CLOSING_CENTRAL: Keep peripheral
(our MAC > peer MAC)
+ DecisionPoint --> CLOSING_PERIPHERAL: Keep central
(our MAC < peer MAC)
+
+ CLOSING_CENTRAL --> NONE: Central disconnected
+ CLOSING_PERIPHERAL --> NONE: Peripheral disconnected
+
+ note right of DecisionPoint
+ Decision based on MAC comparison:
+ - Lower MAC = central role
+ - Higher MAC = peripheral role
+ end note
+```
+
+### DeduplicationState Enum
+
+```kotlin
+enum class DeduplicationState {
+ NONE, // Normal - use actual isCentral/isPeripheral
+ CLOSING_CENTRAL, // Keeping peripheral, central disconnect pending
+ CLOSING_PERIPHERAL // Keeping central, peripheral disconnect pending
+}
+```
+
+### Deduplication Flow
+
+```mermaid
+sequenceDiagram
+ participant Bridge as KotlinBLEBridge
+ participant Client as BleGattClient
+ participant Server as BleGattServer
+ participant Python as AndroidBLEDriver
+
+ Note over Bridge: Dual connection detected
Same identity on both paths
+
+ Bridge->>Bridge: Compare MAC addresses
+ alt Our MAC < Peer MAC (we should be central)
+ Bridge->>Bridge: Set state = CLOSING_PERIPHERAL
+ Bridge->>Server: disconnectCentral(address)
+ Bridge->>Python: onAddressChanged(peripheral_addr, central_addr, identity)
+ else Our MAC > Peer MAC (we should be peripheral)
+ Bridge->>Bridge: Set state = CLOSING_CENTRAL
+ Bridge->>Client: disconnect(address)
+ Bridge->>Python: onAddressChanged(central_addr, peripheral_addr, identity)
+ end
+
+ Note over Python: Update address mappings
Migrate fragmenter keys
+
+ Bridge->>Bridge: Set state = NONE
+```
+
+---
+
+## Data Flow
+
+### Sending Data (Python → BLE)
+
+```mermaid
+flowchart TB
+ subgraph Python["Python Layer"]
+ A[BLEPeerInterface.process_outgoing] --> B[Get fragmenter by identity_key]
+ B --> C[BLEFragmenter.fragment]
+ C --> D["Fragments with header:
type(1) + seq(2) + total(2)"]
+ D --> E[AndroidBLEDriver.send]
+ end
+
+ subgraph Kotlin["Kotlin Layer"]
+ E --> F[KotlinBLEBridge.sendAsync]
+ F --> G{Check deduplicationState}
+ G -->|NONE| H{isCentral?}
+ G -->|CLOSING_*| I[Block send - in transition]
+ H -->|Yes| J[GattClient.sendData]
+ H -->|No| K[GattServer.notifyCentrals]
+ J --> L[Write to RX characteristic]
+ K --> M[Notify via TX characteristic]
+ end
+
+ L --> N[Remote peripheral receives]
+ M --> O[Remote central receives]
+```
+
+### Receiving Data (BLE → Python)
+
+```mermaid
+flowchart TB
+ subgraph BLE["BLE Stack"]
+ A[Notification/Write received]
+ end
+
+ subgraph Kotlin["Kotlin Layer"]
+ A --> B{Is central or peripheral?}
+ B -->|Central| C[onCharacteristicChanged]
+ B -->|Peripheral| D[onCharacteristicWriteRequest]
+ C --> E[Bridge.handleDataReceived]
+ D --> E
+ E --> F{First 16 bytes, no identity?}
+ F -->|Yes| G[Identity handshake - store]
+ F -->|No| H[Forward to Python]
+ end
+
+ subgraph Python["Python Layer"]
+ H --> I[AndroidBLEDriver._handle_data_received]
+ I --> J{Check identity handshake}
+ J -->|Yes, 16 bytes| K[_handle_identity_handshake]
+ J -->|No| L[_handle_ble_data]
+ L --> M[Get reassembler by identity_key]
+ M --> N[BLEReassembler.add_fragment]
+ N --> O{Complete packet?}
+ O -->|Yes| P[BLEPeerInterface.process_incoming]
+ O -->|No| Q[Wait for more fragments]
+ end
+```
+
+---
+
+## Keepalive Mechanism
+
+Android BLE connections timeout after 20-30 seconds of inactivity. Both layers implement keepalives:
+
+```mermaid
+sequenceDiagram
+ participant Client as BleGattClient
+ participant Timer as Keepalive Timer
(15s interval)
+ participant Peer as Remote Peripheral
+
+ Note over Client: Connection established
+ Client->>Timer: Start keepalive job
+
+ loop Every 15 seconds
+ Timer->>Client: Send keepalive
+ Client->>Peer: Write 0x00 to RX
+ alt Success
+ Peer-->>Client: Write confirmed
+ Client->>Timer: Reset failure counter
+ else Failure
+ Client->>Timer: Increment failures
+ alt failures >= 3
+ Timer->>Client: Connection dead
+ Client->>Client: disconnect()
+ end
+ end
+ end
+```
+
+### Keepalive Configuration
+
+| Parameter | Value | Source |
+|-----------|-------|--------|
+| Interval | 15 seconds | `BleConstants.CONNECTION_KEEPALIVE_INTERVAL_MS` |
+| Max failures | 3 | `BleConstants.MAX_CONNECTION_FAILURES` |
+| Packet | `0x00` (1 byte) | Minimal overhead |
+
+Both `BleGattClient` (central) and `BleGattServer` (peripheral) maintain independent keepalive mechanisms.
+
+> **Keepalives prove the *link*, not the *data path*.** A 1-byte keepalive write succeeds as
+> long as the connection exists at the BLE link layer — even when larger data fragments are
+> silently failing (RF degradation) and even when the application has stopped sending. Detecting
+> a *data-dead* link is the job of the data-path liveness probe (below), not the keepalive.
+
+---
+
+## Data-Path Liveness Probe (protocol v0.4.0)
+
+A BLE link can be connected at the link layer (which keeps idle connections alive with empty
+PDUs and only drops on radio-loss supervision timeout) while real data silently stops flowing —
+keepalives still succeed, but larger fragments fail. Every *reactive* liveness check misses
+this because the link is genuinely "up": the peer looks "connected" forever while no data moves,
+with no detection and no recovery.
+
+`BLEInterface` adds an **active round-trip probe over the real data path**:
+
+| Frame | Bytes | Meaning |
+|-------|----------------|-----------------------------------------------|
+| PING | `0x04` + nonce | Liveness request (sent via `driver.send`) |
+| PONG | `0x05` + nonce | Echoed reply |
+
+```mermaid
+sequenceDiagram
+ participant A as Local
+ participant B as Peer (probe-capable)
+ Note over A: idle > data_path_probe_interval (15s)
+ A->>B: PING (0x04, nonce) [real data path]
+ alt data path alive
+ B-->>A: PONG (0x05, nonce)
+ Note over A: last_real_data refreshed → link stays fresh
+ else data path dead (PING never arrives)
+ Note over A: no PONG; last_real_data goes stale
+ Note over A: stale > data_path_timeout (45s) →
driver.disconnect() → reconnect + re-handshake
+ end
+```
+
+Key properties:
+
+- **The probe is the keep-fresh traffic.** On a genuinely idle-but-healthy link the PING/PONG
+ round-trip refreshes `last_real_data`, so idle links are *never* falsely reaped.
+- **Capability is auto-negotiated.** A peer becomes "probe-capable" on the first PING/PONG seen;
+ only probe-capable peers are reaped on a dead path. The 2-byte frames are shorter than the
+ 5-byte fragment header, so peers that predate the probe reject them as "too short" and are
+ unaffected (and never reaped by the probe).
+- **Asymmetric failures** are covered: each side detects death of its own *inbound* direction;
+ one side reconnecting re-establishes both.
+- **Tunable** via `data_path_probe_interval` (15s), `data_path_timeout` (45s),
+ `data_path_probe_poll_interval` (10s).
+
+See `BLE_PROTOCOL_v0.4.0.md` for the normative spec.
+
+---
+
+## Scanning and Advertising
+
+### Adaptive Scanning
+
+```mermaid
+stateDiagram-v2
+ [*] --> Active: Start scanning
+
+ Active --> Active: New device discovered
+ Active --> Idle: 3 scans without new devices
+
+ Idle --> Active: New device discovered
+ Idle --> Idle: No new devices
+
+ note right of Active
+ Interval: 5s
+ Mode: BALANCED or LOW_LATENCY
+ end note
+
+ note right of Idle
+ Interval: 30s
+ Mode: LOW_POWER
+ end note
+```
+
+### Scan Configuration
+
+| Parameter | Active | Idle |
+|-----------|--------|------|
+| Interval | 5 seconds | 30 seconds |
+| Duration | 10 seconds | 10 seconds |
+| Mode | `SCAN_MODE_BALANCED` | `SCAN_MODE_LOW_POWER` |
+| Threshold | 3 devices | 3 empty scans |
+
+### Advertising with Proactive Refresh
+
+```mermaid
+sequenceDiagram
+ participant Adv as BleAdvertiser
+ participant Timer as Refresh Timer
(60s interval)
+ participant Android as Android BLE
+
+ Adv->>Android: startAdvertising()
+ Android-->>Adv: onStartSuccess()
+ Adv->>Timer: Start refresh job
+
+ loop Every 60 seconds
+ Timer->>Adv: Proactive refresh
+ Adv->>Android: stopAdvertising()
+ Adv->>Android: startAdvertising()
+ Note over Adv: Ensures advertising persists
after screen off/background
+ end
+```
+
+### Advertisement Data Structure
+
+```
+Advertising Data (31 bytes max):
+├── Flags (3 bytes)
+└── Service UUID (19 bytes for 128-bit UUID)
+
+Scan Response (31 bytes separate budget):
+└── (empty — device name not advertised)
+```
+
+---
+
+## Address/Identity Mapping Summary
+
+### Python Layer (`BLEInterface`)
+
+| Dictionary | Key | Value | Purpose |
+|------------|-----|-------|---------|
+| `address_to_identity` | MAC address | 16-byte identity | MAC → identity lookup |
+| `identity_to_address` | 16-char hash | MAC address | Identity → current MAC |
+| `spawned_interfaces` | 16-char hash | BLEPeerInterface | Identity → interface |
+| `address_to_interface` | MAC address | BLEPeerInterface | Fallback cleanup |
+| `_identity_cache` | MAC address | (identity, timestamp) | Reconnection cache (60s TTL) |
+| `_pending_identity_connections` | MAC address | timestamp | Timeout tracking |
+| `_pending_detach` | 16-char hash | timestamp | Grace period detach |
+| `pending_mtu` | MAC address | MTU value | MTU/identity race handling |
+| `fragmenters` | identity_key | BLEFragmenter | Per-identity fragmentation |
+| `reassemblers` | identity_key | BLEReassembler | Per-identity reassembly |
+
+### Kotlin Layer (`KotlinBLEBridge`)
+
+| Map | Key | Value | Purpose |
+|-----|-----|-------|---------|
+| `addressToIdentity` | MAC address | 32-char hex | MAC → identity |
+| `identityToAddress` | 32-char hex | MAC address | Identity → MAC |
+| `connectedPeers` | MAC address | PeerConnection | Active connections |
+| `pendingConnections` | MAC address | PendingConnection | Awaiting identity |
+| `pendingCentralConnections` | Set | - | In-progress central connects |
+| `processedIdentityCallbacks` | Set | - | Prevent duplicate notifications |
+
+---
+
+## Potential Issues & Recommendations
+
+### 1. GATT Operation Timeout (5s default)
+
+**Issue**: The default 5-second timeout in `BleOperationQueue` may be too short for slow or congested BLE environments.
+
+**Impact**: GATT operations may fail prematurely on:
+- Older devices with slower BLE stacks
+- Environments with high 2.4GHz interference
+- During rapid connection/disconnection cycles
+
+**Recommendation**: Consider adaptive timeouts based on operation type and historical success rates.
+
+### 2. Advertising Refresh Interval (60s)
+
+**Issue**: The 60-second advertising refresh may miss discovery windows.
+
+**Impact**: If Android silently stops advertising immediately after screen-off, devices may be undiscoverable for up to 60 seconds.
+
+**Recommendation**:
+- Reduce to 30 seconds when battery is not a concern
+- Add `BroadcastReceiver` for `ACTION_SCREEN_OFF` to trigger immediate refresh
+
+### 3. Identity Cache Coherence
+
+**Issue**: The 60-second identity cache in Python may become stale if not properly synchronized with Kotlin state.
+
+**Impact**: Race conditions during rapid reconnection cycles could cause identity mismatches.
+
+**Recommendation**: Add explicit cache invalidation when Kotlin detects MAC rotation or deduplication.
+
+### 4. Fragmenter Key Complexity
+
+**Issue**: Fragmenter keys use `_get_fragmenter_key(identity, address)` but the address parameter is unused.
+
+**Current code**:
+```python
+def _get_fragmenter_key(self, peer_identity, address):
+ # Address unused - key is identity-based for MAC rotation immunity
+ return self._compute_identity_hash(peer_identity)
+```
+
+**Recommendation**: Remove unused `address` parameter to avoid confusion.
+
+### 5. Double Identity Callback Processing
+
+**Issue**: Both Kotlin (`onIdentityReceived`) and Python (`_handle_identity_handshake`) detect and process identity handshakes.
+
+**Impact**: Additional complexity and potential for desynchronization.
+
+**Recommendation**: Single point of identity detection (Kotlin) with Python purely as a consumer.
+
+### 6. Grace Period Timing
+
+**Issue**: The 2-second detach grace period (`_pending_detach_grace_period`) is hardcoded.
+
+**Impact**: May not be sufficient for slow network conditions or concurrent reconnection attempts.
+
+**Recommendation**: Make configurable via interface parameters, with a suggested default of 3-5 seconds.
+
+---
+
+## Key Constants Reference
+
+### UUIDs (BleConstants.kt)
+
+| Constant | Value |
+|----------|-------|
+| `SERVICE_UUID` | `37145b00-442d-4a94-917f-8f42c5da28e3` |
+| `CHARACTERISTIC_RX_UUID` | `37145b00-442d-4a94-917f-8f42c5da28e5` |
+| `CHARACTERISTIC_TX_UUID` | `37145b00-442d-4a94-917f-8f42c5da28e4` |
+| `CHARACTERISTIC_IDENTITY_UUID` | `37145b00-442d-4a94-917f-8f42c5da28e6` |
+| `CCCD_UUID` | `00002902-0000-1000-8000-00805f9b34fb` |
+
+### Timing Constants
+
+| Constant | Value | Location |
+|----------|-------|----------|
+| `CONNECTION_TIMEOUT_MS` | 30,000 ms | BleConstants |
+| `CONNECTION_KEEPALIVE_INTERVAL_MS` | 15,000 ms | BleConstants |
+| `DISCOVERY_INTERVAL_MS` | 5,000 ms | BleConstants |
+| `DISCOVERY_INTERVAL_IDLE_MS` | 30,000 ms | BleConstants |
+| `SCAN_DURATION_MS` | 10,000 ms | BleConstants |
+| `ADVERTISING_REFRESH_INTERVAL_MS` | 60,000 ms | BleAdvertiser |
+| `_identity_cache_ttl` | 60 s | BLEInterface |
+| `_pending_detach_grace_period` | 2.0 s | BLEInterface |
+
+
+### MTU Constants
+
+| Constant | Value | Meaning |
+|----------|-------|---------|
+| `MIN_MTU` | 23 | BLE 4.0 minimum |
+| `DEFAULT_MTU` | 185 | Reasonable default |
+| `MAX_MTU` | 517 | BLE 5.0 maximum |
+| `HW_MTU` | 500 | Reticulum standard |
+
+---
+
+## File Locations
+
+| Component | Path |
+|-----------|------|
+| BLEInterface.py | `app/build/python/pip/release/common/ble_reticulum/BLEInterface.py` |
+| AndroidBLEDriver | `python/ble_modules/android_ble_driver.py` |
+| KotlinBLEBridge | `reticulum/src/main/java/network.columba.app/reticulum/ble/bridge/KotlinBLEBridge.kt` |
+| BleGattClient | `reticulum/src/main/java/network.columba.app/reticulum/ble/client/BleGattClient.kt` |
+| BleGattServer | `reticulum/src/main/java/network.columba.app/reticulum/ble/server/BleGattServer.kt` |
+| BleScanner | `reticulum/src/main/java/network.columba.app/reticulum/ble/client/BleScanner.kt` |
+| BleAdvertiser | `reticulum/src/main/java/network.columba.app/reticulum/ble/server/BleAdvertiser.kt` |
+| BleOperationQueue | `reticulum/src/main/java/network.columba.app/reticulum/ble/util/BleOperationQueue.kt` |
+| BleConstants | `reticulum/src/main/java/network.columba.app/reticulum/ble/model/BleConstants.kt` |
diff --git a/pyproject.toml b/pyproject.toml
index aa24bc4..93663b5 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "ble-reticulum"
-version = "0.2.2"
+version = "0.3.0"
description = "Bluetooth Low Energy (BLE) interface for Reticulum Network Stack"
readme = "README.md"
requires-python = ">=3.8"
diff --git a/src/ble_reticulum/BLEInterface.py b/src/ble_reticulum/BLEInterface.py
index 49ed348..4973fcb 100644
--- a/src/ble_reticulum/BLEInterface.py
+++ b/src/ble_reticulum/BLEInterface.py
@@ -399,6 +399,27 @@ def __init__(self, owner, configuration):
self._last_real_data = {}
self._zombie_timeout = 30.0 # seconds - connection is zombie if no real data for this long
+ # Data-path liveness probe (protocol v0.4.0). A small PING(0x04)/PONG(0x05)
+ # round-trip over the REAL data path detects a "connected but data-dead" link
+ # that neither the link layer (it keeps idle links up) nor keepalives (1-byte
+ # writes succeed while larger data fails) can catch -- then forces a reconnect.
+ # The probe IS the traffic, so a healthy IDLE link is kept fresh (no churn)
+ # while a genuinely dead data path goes stale and is reaped. Capability is
+ # auto-negotiated: a peer is marked probe-capable on the first PING/PONG seen,
+ # and only probe-capable peers are reaped on a dead path. The 2-byte frames are
+ # < the 5-byte fragment header, so peers that predate the probe reject them as
+ # "too short" and are never falsely reaped. Intervals are config-tunable.
+ self._probe_ping = 0x04
+ self._probe_pong = 0x05
+ # PING a link that has had no real data for this many seconds.
+ self._probe_interval = float(c.get("data_path_probe_interval", 15.0))
+ # Reconnect a probe-capable peer whose data path has been silent this long.
+ self._data_path_timeout = float(c.get("data_path_timeout", 45.0))
+ # How often the probe/detect loop runs.
+ self._probe_poll_interval = float(c.get("data_path_probe_poll_interval", 10.0))
+ self._probe_capable = {} # identity_hash -> True (peer speaks the probe)
+ self._probe_timer = None
+
# Fragmentation
self.fragmenters = {} # address -> BLEFragmenter (per MTU)
self.reassemblers = {} # address -> BLEReassembler
@@ -473,6 +494,9 @@ def __init__(self, owner, configuration):
self.cleanup_timer = None
self._start_cleanup_timer()
+ # Start the data-path liveness probe loop (PING/PONG -> detect data-dead -> reconnect)
+ self._start_probe_timer()
+
# Start the interface
self.start()
@@ -683,6 +707,91 @@ def _clear_stale_ble_paths(self):
except Exception as e:
RNS.log(f"{self} Error during stale path cleanup (non-fatal): {e}", RNS.LOG_WARNING)
+ def _send_probe(self, address, ptype, nonce):
+ """Send a 2-byte data-path probe frame (PING/PONG) over the real data path."""
+ try:
+ self.driver.send(address, bytes([ptype, nonce & 0xFF]))
+ except Exception as e:
+ RNS.log(f"{self} data-path probe send to {address} failed: {e}", RNS.LOG_DEBUG)
+
+ def _handle_probe_frame(self, address, data):
+ """
+ Handle an inbound data-path liveness frame. Returns True if `data` was a
+ probe frame (and is now consumed), False otherwise.
+
+ Receiving ANY probe frame proves the inbound data path is alive and that the
+ peer speaks the probe (so it is marked probe-capable). A PING is echoed as a
+ PONG so the sender's round-trip completes.
+ """
+ if len(data) != 2 or data[0] not in (self._probe_ping, self._probe_pong):
+ return False
+ # A peer can deliver a frame under its "dev:"-prefixed peripheral address
+ # while the central-path handshake stored its identity under the plain MAC
+ # (dual-role connection). Normalize so the identity resolves either way.
+ plain = address[4:] if address.startswith("dev:") else address
+ peer_identity = (self.address_to_identity.get(address)
+ or self.address_to_identity.get(plain)
+ or self.address_to_identity.get("dev:" + plain))
+ if peer_identity:
+ identity_hash = self._compute_identity_hash(peer_identity)
+ self._last_real_data[identity_hash] = time.time()
+ self._probe_capable[identity_hash] = True
+ else:
+ RNS.log(f"{self} data-path probe from unmapped address {address}, dropping", RNS.LOG_EXTREME)
+ if data[0] == self._probe_ping:
+ self._send_probe(address, self._probe_pong, data[1])
+ RNS.log(f"{self} data-path PING from {address}, replied PONG", RNS.LOG_EXTREME)
+ else:
+ RNS.log(f"{self} data-path PONG from {address}", RNS.LOG_EXTREME)
+ return True
+
+ def _start_probe_timer(self):
+ """Start/restart the periodic data-path probe + dead-path detection loop."""
+ if self._probe_timer:
+ self._probe_timer.cancel()
+ self._probe_timer = threading.Timer(self._probe_poll_interval, self._run_data_path_probes)
+ self._probe_timer.daemon = True
+ self._probe_timer.start()
+
+ def _run_data_path_probes(self):
+ """
+ Periodic data-path liveness sweep over the spawned peers.
+
+ For each peer:
+ - If the link has been idle longer than _probe_interval, send a PING. On a
+ healthy link the peer echoes a PONG, which refreshes _last_real_data -- so
+ the probe is itself the traffic that keeps a genuinely idle-but-healthy link
+ from ever looking dead. Idle links are therefore never reaped.
+ - If a probe-capable peer's data path has been silent past _data_path_timeout,
+ the link is "connected but data-dead" (the link layer keeps the connection
+ up while real data silently fails); tear it down so it re-establishes.
+
+ Peers that have never spoken the probe are not probe-capable and are left to
+ the existing reactive checks, so older peers are never falsely reaped.
+ """
+ try:
+ now = time.time()
+ for identity_hash in list(self.spawned_interfaces.keys()):
+ address = self.identity_to_address.get(identity_hash)
+ if not address:
+ continue
+ idle = now - self._last_real_data.get(identity_hash, now)
+ if idle > self._probe_interval:
+ self._send_probe(address, self._probe_ping, int(now))
+ if self._probe_capable.get(identity_hash) and idle > self._data_path_timeout:
+ RNS.log(f"{self} data-path dead for {identity_hash[:8]} "
+ f"(no real data for {idle:.0f}s > {self._data_path_timeout:.0f}s) -- reconnecting",
+ RNS.LOG_WARNING)
+ self._probe_capable.pop(identity_hash, None)
+ try:
+ self.driver.disconnect(address)
+ except Exception as e:
+ RNS.log(f"{self} probe-driven disconnect of {address} failed: {e}", RNS.LOG_DEBUG)
+ except Exception as e:
+ RNS.log(f"{self} data-path probe loop error: {e}", RNS.LOG_ERROR)
+ finally:
+ self._start_probe_timer()
+
def _start_cleanup_timer(self):
"""
Start the periodic cleanup timer.
@@ -814,6 +923,7 @@ def _process_pending_detaches(self):
# Clean up zombie detection tracking
if identity_hash in self._last_real_data:
del self._last_real_data[identity_hash]
+ self._probe_capable.pop(identity_hash, None)
# Clean up fragmenter/reassembler now that interface is fully detached
if peer_identity:
frag_key = self._get_fragmenter_key(peer_identity, "") # Address unused in key computation
@@ -1970,6 +2080,10 @@ def _handle_ble_data(self, peer_address, data):
RNS.log(f"{self} received keep-alive from peer {peer_address}, ignoring", RNS.LOG_EXTREME)
return
+ # Data-path liveness probe (PING/PONG round-trip over the real data path)
+ if self._handle_probe_frame(peer_address, data):
+ return
+
# Look up peer identity to compute fragmenter key
peer_identity = self.address_to_identity.get(peer_address)
if not peer_identity:
@@ -2100,6 +2214,10 @@ def handle_peripheral_data(self, data, sender_address):
RNS.log(f"{self} received keep-alive from central {sender_address}, ignoring", RNS.LOG_EXTREME)
return
+ # Data-path liveness probe (PING/PONG round-trip over the real data path)
+ if self._handle_probe_frame(sender_address, data):
+ return
+
# Check if we have peer identity
peer_identity = self.address_to_identity.get(sender_address)
@@ -2342,6 +2460,11 @@ def detach(self):
self.cleanup_timer.cancel()
self.cleanup_timer = None
+ # Cancel data-path probe timer
+ if self._probe_timer:
+ self._probe_timer.cancel()
+ self._probe_timer = None
+
# Detach spawned interfaces
for peer_if in list(self.spawned_interfaces.values()):
peer_if.detach()