Skip to content

perf: optimize sync with mtime-based change detection and configurable interval#268

Open
troshab wants to merge 1 commit intozilliztech:masterfrom
troshab:perf/sync-optimization
Open

perf: optimize sync with mtime-based change detection and configurable interval#268
troshab wants to merge 1 commit intozilliztech:masterfrom
troshab:perf/sync-optimization

Conversation

@troshab
Copy link

@troshab troshab commented Feb 12, 2026

Summary

Two improvements to the background file sync mechanism:

1. mtime+size based change detection

Instead of reading and SHA-256 hashing every file on each sync cycle, the synchronizer now:

  1. stat() each file to get mtime + size (~100x faster than read+hash)
  2. Compare with cached mtime/size from previous snapshot
  3. Only read + hash files where mtime or size actually changed
  4. Reuse cached content hash for unchanged files

Impact for a 17k-file codebase with 0 changes: ~200ms (stat only) vs ~10s (read+hash all).

File stats are persisted in the snapshot alongside content hashes. Old snapshots without stats are handled gracefully (falls back to full hash on first run after upgrade).

2. Configurable sync interval via SYNC_INTERVAL_SECONDS

  • Default changed from 300s (5 min) to 60s (1 min)
  • Configurable via SYNC_INTERVAL_SECONDS environment variable
  • Minimum 10s floor to prevent excessive polling
  • The shorter default is practical now that per-cycle cost is dramatically lower

Test plan

  • pnpm build compiles without errors
  • Backward compatible with old snapshot format (no fileStats field)
  • Verify sync detects file changes correctly (add/modify/delete)
  • Verify unchanged files are not re-hashed (check logs for hash skip count)
  • Verify SYNC_INTERVAL_SECONDS=30 changes the sync interval

Closes #267

…e interval

Two improvements to the background sync mechanism:

1. Use mtime+size to skip unchanged files during sync. Instead of
   reading and SHA-256 hashing every file on each sync cycle, stat()
   each file first and only hash files where mtime or size changed.
   For a 17k-file codebase with 0 changes: ~200ms (stat only) vs
   ~10s (read+hash all). File stats are persisted in the snapshot
   alongside content hashes, with backward compatibility for old
   snapshots (falls back to full hash on first run).

2. Make sync interval configurable via SYNC_INTERVAL_SECONDS env var
   (default: 60s, was hardcoded 300s). Minimum 10s to prevent
   excessive polling. The shorter default is practical now that
   per-cycle cost is dramatically lower with mtime optimization.

Closes zilliztech#267
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: optimize sync interval and use mtime-based change detection

1 participant