fix: persist event-watcher state for clean validator warm restarts#339
Open
LandynDev wants to merge 3 commits into
Open
fix: persist event-watcher state for clean validator warm restarts#339LandynDev wants to merge 3 commits into
LandynDev wants to merge 3 commits into
Conversation
Adds active_events, busy_events, event_watcher_meta (cursor) and bootstrapped_swaps tables plus the methods that read, write, prune and reset them. Anchor-preserving prune mirrors the existing rate_events rule: latest row per hotkey is kept past cutoff so window-start reconstruction stays correct after pruning.
initialize() now branches on the persisted cursor: a fresh DB still cold-bootstraps from the contract; a cursor within one scoring window of head hydrates the in-memory active/busy mirrors from state.db without touching the contract; a cursor further back wipes persistence and falls back to cold. Transitions write through on every record_active_transition and apply_busy_delta. Cursor advances per block at the tail of process_block, so a crash mid-chunk re-replays at most one block. bootstrapped_swap_ids is persisted so warm restarts preserve the skip-list that prevents double-counting the seeded +1 against the SwapInitiated replay. Pruned-block exceptions during get_block_hash/get_events on a public finney node (which keeps only ~240 blocks of state) collapse into one INFO summary per sync_to instead of ~360 per-block warnings during the catch-up after restart.
Adds five test classes: - TestStateStoreEventTables: round-trip + anchor-preserving prune for the four new tables. - TestEventWatcherWarmRestart: cold writes anchors, warm hydrates without contract reads, long outage falls back to cold. - TestEventWatcherWriteThrough: transitions persist; cursor advances per block. - TestEventWatcherLogHygiene: pruned-block error collapses into a single summary, unrelated exceptions still log per-block, counter resets between sync_to calls. - TestSwapOutcomesIdempotency: re-applying SwapCompleted does not duplicate swap_outcomes rows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validator restarts on lena triggered four observed pains:
sync_towalks back 600 blocks from cold-start cursor; the public finney RPC drops state past ~240 blocks, so ~360 "State already discarded" warnings flood the log on every restart.active_events/busy_eventstimelines are partial, scoring sees most miners as inactive, and the pool routes toRECYCLE_UID.get_miner_active_flagRPCs + 1get_active_swapsbefore the first forward step.Fix
Persist the event-watcher's reconstructed timeline to
state.dband hydrate it back on restart. Warm restart trusts the DB as source of truth and skips contract reads entirely. A persisted cursor more than one scoring window behind head wipes persistence and falls back to cold bootstrap — the chain has moved past replayable history so the contract is the only authority left.Per-block cursor advance means a crash mid-chunk re-replays at most one block instead of an entire 50-block chunk.
bootstrapped_swap_idsis persisted alongside the cursor so warm restarts keep the skip-list that prevents double-counting the cold-seeded+1against itsSwapInitiatedreplay.Pruned-block exceptions during the post-restart catch-up collapse into a single INFO summary per
sync_to("360 pruned blocks skipped (blocks 4123..4482)") instead of one warning per block.Test plan
pytest tests/test_event_watcher.py— 43 passed (29 new)pytest tests/— 496 passedruff format+ruff checkclean