fix: verify connection is still alive before rejecting duplicate identity#38
Conversation
…tity When a peer disconnects, identity_to_address is NOT cleaned up immediately - it's only removed after a 2-second grace period. However, _check_duplicate_identity was not checking if the existing address is still connected before rejecting. This caused legitimate reconnections from the same identity (after MAC rotation or reconnection) to be incorrectly rejected as "duplicates" during the grace period or when cleanup was delayed. The fix adds two checks before rejecting: 1. If pending_detach exists for this identity (old connection already gone) 2. If existing address is not in connected_peers or peers dict Also adds TDD tests that demonstrate the bug and verify the fix. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Greptile SummaryFixed a critical bug where legitimate reconnections from the same identity were incorrectly rejected as duplicates. The issue occurred because Key Changes
Issues Found
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Peer as Peer Device
participant Driver as BLE Driver
participant IF as BLEInterface
participant Maps as State Maps
Note over Peer,Maps: Scenario 1: Stale Entry After Disconnect
Peer->>Driver: Connect (MAC_OLD)
Driver->>IF: _connected_callback(MAC_OLD, identity_A)
IF->>Maps: identity_to_address[hash_A] = MAC_OLD
IF->>Maps: address_to_identity[MAC_OLD] = identity_A
Note over Peer,Maps: Connection established
Peer->>Driver: Disconnect
Driver->>IF: _disconnected_callback(MAC_OLD)
IF->>Maps: _pending_detach[hash_A] = timestamp
Note over Maps: identity_to_address[hash_A] still exists<br/>(2s grace period)
Peer->>Driver: Reconnect (MAC_NEW, identity_A)
Driver->>IF: _check_duplicate_identity(MAC_NEW, identity_A)
alt Check 1: pending_detach exists
IF->>Maps: Check _pending_detach[hash_A]
Maps-->>IF: Found (connection gone)
IF->>IF: _cleanup_stale_address(hash_A, MAC_OLD)
IF-->>Driver: False (allow connection)
else Check 2: not in connected_peers
IF->>Maps: Check driver.connected_peers[MAC_OLD]
Maps-->>IF: Not found
IF->>Maps: Check peers[MAC_OLD]
Maps-->>IF: Not found
IF->>IF: _cleanup_stale_address(hash_A, MAC_OLD)
IF-->>Driver: False (allow connection)
end
Driver->>IF: Connection proceeds with MAC_NEW
Note over Peer,Maps: Scenario 2: Zombie Connection Detection
Peer->>IF: Keepalive packets only (1 byte)
Note over IF,Maps: No real data for 30+ seconds<br/>_last_real_data[hash_A] = old_timestamp
Peer->>Driver: Reconnect attempt (MAC_NEW, identity_A)
Driver->>IF: _check_duplicate_identity(MAC_NEW, identity_A)
IF->>Maps: Check driver.connected_peers[MAC_OLD]
Maps-->>IF: Still connected (keepalives work)
alt Check 3: zombie connection
IF->>Maps: Check _last_real_data[hash_A]
Maps-->>IF: 60 seconds ago
IF->>IF: time_since_data > _zombie_timeout (30s)
IF->>IF: _cleanup_stale_address(hash_A, MAC_OLD)
IF->>Driver: disconnect(MAC_OLD)
IF-->>Driver: False (allow connection)
end
Driver->>IF: Connection proceeds with MAC_NEW
Note over Peer,Maps: Scenario 3: Healthy Connection (Reject Duplicate)
Peer->>IF: Regular data packets
IF->>Maps: _last_real_data[hash_A] = current_time
Peer->>Driver: Duplicate attempt (MAC_NEW, identity_A)
Driver->>IF: _check_duplicate_identity(MAC_NEW, identity_A)
IF->>Maps: Check _pending_detach[hash_A]
Maps-->>IF: Not found
IF->>Maps: Check driver.connected_peers[MAC_OLD]
Maps-->>IF: Connected
IF->>Maps: Check _last_real_data[hash_A]
Maps-->>IF: Recent (< 30s)
IF-->>Driver: True (reject duplicate)
Driver->>Peer: Connection rejected
|
When BLE link degrades, 1-byte keepalives may still work while larger data packets fail. Both sides think the connection is "alive" based on keepalives, but data can't flow. This causes a deadlock where new connections are rejected as "duplicates" even though the existing connection is non-functional. This change adds zombie detection by tracking when real data (not keepalives) was last received. If an existing connection has only exchanged keepalives for > 30 seconds (configurable via _zombie_timeout), new connections from the same identity are allowed and the zombie connection is disconnected. Changes: - Add _last_real_data dict to track last real data timestamp per identity - Add _zombie_timeout (default 30s) for configurable zombie threshold - Update _check_duplicate_identity with Check 3: zombie detection - Update _handle_ble_data to track real data activity after keepalive filter - Initialize tracking in _handle_identity_handshake and _spawn_peer_interface - Clean up tracking in _process_pending_detaches - Add comprehensive test suite for zombie detection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
… via Kotlin callback When Kotlin provides the identity via callback (from the identity characteristic read), the address_to_identity mapping gets set BEFORE the 16-byte handshake data arrives through _data_received_callback. Previously, _handle_identity_handshake would see the identity already exists and return False, causing the 16-byte handshake data to be passed to the reassembler where it fails with "Invalid fragment type 0xXX". The fix checks if received 16-byte data matches the known identity and consumes it silently if so. This prevents the handshake data from being misinterpreted as a fragment. Symptoms fixed: - BLEReassembler: Invalid fragment type 0xc9 (first byte of peer identity) - Messages not flowing even though connections appear established Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests verify that: - Duplicate 16-byte handshake matching known identity is consumed - Different 16-byte data is also consumed to prevent reassembler errors - Non-16-byte data is not incorrectly consumed as handshake - Normal handshake processing works when identity not yet known Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…_identity Add tests covering previously uncovered code paths: - Pending detach check (Check 1) allowing reconnection - Not-connected check (Check 2) allowing reconnection - Exception handling when zombie disconnect fails Improves patch coverage for PR #38 from 48.57% to full coverage of the _check_duplicate_identity changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test for _pending_identity_connections cleanup during successful identity handshake (lines 1272-1275), achieving 100% patch coverage for PR #38 changes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace Mock-based fixtures with real BLEInterface instances in stale identity check tests. This ensures coverage.py properly tracks execution of production code paths. The Mock approach with method binding executed the production code but coverage tracking was inconsistent. Using real instances guarantees proper coverage attribution. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…tests Add tests to test_zombie_connection_detection.py (which CI runs) to cover: - _handle_identity_handshake: non-16-byte rejection, duplicate handling - _pending_identity_connections cleanup after handshake - _spawn_peer_interface zombie tracking initialization These tests cover the same code paths as test_v2_2_identity_handshake.py but are in a file that CI includes, achieving 100% patch coverage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
Fixes a bug where legitimate reconnections from the same identity were incorrectly rejected as "duplicates".
Problem: When a peer disconnects,
identity_to_addressis NOT cleaned up immediately - it's only removed after a 2-second grace period in_process_pending_detaches(). However,_check_duplicate_identity()was not checking if the existing address is still connected before rejecting.This caused:
Solution: Before rejecting as duplicate, verify that the old connection is still alive:
pending_detachexists for this identity (meaning old connection is already gone)connected_peersorpeersdictIf either check fails, the old connection is dead → allow the new connection and clean up stale mappings.
Test plan
tests/test_ble_duplicate_identity_stale.pytest_mac_rotation_blacklist_bug.pyto set up active connectionsChanges
src/ble_reticulum/BLEInterface.py: Added connection liveness checks to_check_duplicate_identity()tests/test_ble_duplicate_identity_stale.py: New TDD test file demonstrating the bug and fixtests/test_mac_rotation_blacklist_bug.py: Updated mock fixtures to includepeers,_pending_detach, andconnected_peers🤖 Generated with Claude Code