Skip to content

fix: retry event watcher blocks after RPC failures#349

Open
victortran0904 wants to merge 3 commits into
entrius:testfrom
victortran0904:codex/retry-event-watcher-failed-blocks
Open

fix: retry event watcher blocks after RPC failures#349
victortran0904 wants to merge 3 commits into
entrius:testfrom
victortran0904:codex/retry-event-watcher-failed-blocks

Conversation

@victortran0904
Copy link
Copy Markdown
Contributor

Summary

  • make process_block() report whether block event retrieval succeeded
  • keep the event watcher cursor at the last successfully processed block when a transient RPC failure occurs
  • add regression coverage for clean sync, mid-window failure, and retry after recovery

Fixes #201

Note: #339 also touches event_watcher.py for state persistence, so this may need a small rebase if that lands first.

Tests

  • uv run pytest tests/test_event_watcher.py -q
  • uv run ruff check allways/validator/event_watcher.py tests/test_event_watcher.py
  • uv run ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.py
  • git diff --check

Copilot AI review requested due to automatic review settings May 20, 2026 06:12
@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label May 20, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes ContractEventWatcher.sync_to() so it only advances the cursor past blocks whose event retrieval succeeded, preventing silent event loss when transient RPC failures occur (Fixes #201).

Changes:

  • Make process_block() return a success flag and stop the sync loop on retrieval failures.
  • Advance cursor to the last successfully processed block instead of unconditionally to the end of the window.
  • Add regression tests covering clean sync, mid-window failure, and retry after recovery.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
allways/validator/event_watcher.py Track last successfully processed block and halt cursor advancement when block/event retrieval fails.
tests/test_event_watcher.py Add tests to ensure cursor stops before a failed block and retries successfully on a subsequent sync.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 351 to +358
try:
block_hash = self.substrate.get_block_hash(block_num)
if not block_hash:
return
return True
events = self.substrate.get_events(block_hash=block_hash)
except Exception as e:
bt.logging.debug(f'EventWatcher: block {block_num} events unavailable: {e}')
return
return False
Comment thread allways/validator/event_watcher.py Outdated
Comment on lines +341 to +345
for block_num in range(self.cursor + 1, end + 1):
self.process_block(block_num)
self.cursor = end
if not self.process_block(block_num):
break
last_processed = block_num
self.cursor = last_processed
Comment thread tests/test_event_watcher.py Outdated
Comment on lines +422 to +433
def test_transient_block_fetch_failure_stops_cursor_before_failed_block(self, tmp_path: Path):
w = make_watcher(tmp_path)
w.cursor = 10

def get_block_hash(block_num: int):
if block_num == 12:
raise RuntimeError('rpc timeout')
return f'hash-{block_num}'

w.substrate.get_block_hash.side_effect = get_block_hash
w.substrate.get_events.return_value = []

@victortran0904
Copy link
Copy Markdown
Contributor Author

Updated this branch to address the actionable Copilot review items:

  • added an explicit warning when sync stops before the requested block range so operators can see the partial catch-up and retry point
  • added regression coverage for get_events raising and then succeeding on a later sync_to call
  • kept the existing sync_to return contract unchanged

I did not add pruned/missing-block special casing because there does not appear to be a clean existing error taxonomy for that path; string matching provider errors would be brittle.

Verification:

  • uv run pytest tests/test_event_watcher.py -q
  • uv run ruff check allways/validator/event_watcher.py tests/test_event_watcher.py
  • uv run ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.py
  • git diff --check

@victortran0904 victortran0904 force-pushed the codex/retry-event-watcher-failed-blocks branch from ea4391f to 3039b8d Compare May 20, 2026 21:43
@victortran0904
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto the latest test and resolved the event watcher test import conflict. The PR diff is still scoped to allways/validator/event_watcher.py and tests/test_event_watcher.py.

Verification after rebase:

  • rtk uv run pytest tests/test_event_watcher.py -q -> 32 passed
  • rtk uv run ruff check allways/validator/event_watcher.py tests/test_event_watcher.py -> passed
  • rtk uv run ruff format --check allways/validator/event_watcher.py tests/test_event_watcher.py -> passed
  • rtk git diff --check -> passed

@victortran0904 victortran0904 force-pushed the codex/retry-event-watcher-failed-blocks branch from 3039b8d to 48e663b Compare May 21, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[S1] event_watcher silently drops events on transient RPC failure

2 participants