Skip to content

[Bug] SubgraphRunner unfail mechanism fails when block hasn't advanced past error blockΒ #6205

@bocaigo

Description

@bocaigo

Bug report

Description:
When a subgraph encounters a non-deterministic error (e.g., DatabaseUnavailable),
the unfail mechanism only attempts once. If the first unfail attempt occurs before
the subgraph has processed past the error block, it returns UnfailOutcome::Noop,
but the should_try_unfail_non_deterministic flag is set to false and never
retried.

This causes subgraphs to remain permanently in Failed state even though they
continue indexing successfully.

Reproduction:

  1. Subgraph encounters DatabaseUnavailable at block N
  2. Database recovers, subgraph restarts from checkpoint at block N-3
  3. First unfail attempt happens at block N-3 (< N), returns Noop
  4. Flag is set to false, never retried
  5. Subgraph continues indexing to N+1000, but health remains "failed"

Evidence:
Log showing the issue:
INFO Subgraph error is still ahead of deployment head, nothing to unfail,
error_block_range: (Included(392332788), Unbounded),
block_number: 392332785

Location:
core/src/subgraph/runner.rs:996

Suggested Fix:
Only set should_try_unfail_non_deterministic = false when UnfailOutcome::Unfailed,
keep it true when UnfailOutcome::Noop to retry on next block.

Relevant log output

IPFS hash

No response

Subgraph name or link to explorer

No response

Some information to help us out

  • Tick this box if this bug is caused by a regression found in the latest release.
  • Tick this box if this bug is specific to the hosted service.
  • I have searched the issue tracker to make sure this issue is not a duplicate.

OS information

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions