Skip to content

Conversation

@rsnodgrass
Copy link
Contributor

@rsnodgrass rsnodgrass commented Jan 23, 2026

Summary

  • Add WAL checkpoint retry logic with exponential backoff
  • Prevent silent data loss when checkpoint fails due to concurrent readers

Changes

  • store.go: Add 5-retry exponential backoff for WAL checkpoint in Close()
  • flush_manager_race_test.go: Concurrency tests for flush manager
  • wal_test.go: Tests for checkpoint with concurrent readers

Security Fixes

Test Plan

  • Run go test -race ./cmd/bd/... -run FlushManager for flush tests
  • Run go test -race ./internal/storage/sqlite/... -run WAL for WAL tests

Related Issues

sequenceDiagram
    participant S as Store.Close()
    participant W as WAL

    S->>W: Checkpoint (attempt 1)
    W-->>S: SQLITE_BUSY (reader lock)
    S->>S: Sleep 100ms
    S->>W: Checkpoint (attempt 2)
    W-->>S: SQLITE_BUSY
    S->>S: Sleep 200ms
    S->>W: Checkpoint (attempt 3)
    W-->>S: Success
    S->>S: Close DB
Loading

@rsnodgrass rsnodgrass marked this pull request as ready for review January 23, 2026 08:39
Add WAL checkpoint retry logic with exponential backoff to prevent data
loss when concurrent readers hold locks during database close.

Security fixes from SECURITY_AUDIT.md:
- Issue steveyegge#8: WAL checkpoint failure silently losing data

Changes:
- Add 5-retry exponential backoff for WAL checkpoint in Close()
- Warn user if checkpoint fails after retries (potential data loss)
- Add flush_manager_race_test.go for concurrency testing
- Add wal_test.go for checkpoint with concurrent readers

Co-Authored-By: SageOx <ox@sageox.ai>
@rsnodgrass rsnodgrass force-pushed the pr/sync-flush-robustness branch from efcef23 to 63c8da9 Compare January 23, 2026 08:47
@rsnodgrass rsnodgrass changed the title fix: improve flush manager timeout and WAL checkpoint robustness fix: add WAL checkpoint robustness with retry logic Jan 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant