-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Bug report
Description:
When a subgraph encounters a non-deterministic error (e.g., DatabaseUnavailable),
the unfail mechanism only attempts once. If the first unfail attempt occurs before
the subgraph has processed past the error block, it returns UnfailOutcome::Noop,
but the should_try_unfail_non_deterministic flag is set to false and never
retried.
This causes subgraphs to remain permanently in Failed state even though they
continue indexing successfully.
Reproduction:
- Subgraph encounters DatabaseUnavailable at block N
- Database recovers, subgraph restarts from checkpoint at block N-3
- First unfail attempt happens at block N-3 (< N), returns Noop
- Flag is set to false, never retried
- Subgraph continues indexing to N+1000, but health remains "failed"
Evidence:
Log showing the issue:
INFO Subgraph error is still ahead of deployment head, nothing to unfail,
error_block_range: (Included(392332788), Unbounded),
block_number: 392332785
Location:
core/src/subgraph/runner.rs:996
Suggested Fix:
Only set should_try_unfail_non_deterministic = false when UnfailOutcome::Unfailed,
keep it true when UnfailOutcome::Noop to retry on next block.
Relevant log output
IPFS hash
No response
Subgraph name or link to explorer
No response
Some information to help us out
- Tick this box if this bug is caused by a regression found in the latest release.
- Tick this box if this bug is specific to the hosted service.
- I have searched the issue tracker to make sure this issue is not a duplicate.
OS information
None