Skip to content

Conversation

@meetrick
Copy link
Contributor

@meetrick meetrick commented Dec 2, 2025

Description

Summary

This ensures safe validator startup whenever O_SYNC durability guarantees are intentionally relaxed.
This PR adds support for the safety_start_delay configuration from CometBFT. When the node is configured to run with disable_os_sync = true (optimized I/O mode), this feature forces the node to wait for a specified duration before starting.
This is a companion PR to CometBFT PR cometbft/cometbft#5515 .

Problem Statement

Validators operating on low-end hardware often face I/O bottlenecks during block commitment, leading to missed blocks. A proposed solution in CometBFT allows disabling os.O_SYNC (disable_os_sync) to eliminate this bottleneck.
However, disabling O_SYNC introduces a risk of "Amnesia" (double signing) if the node restarts immediately after a power failure, as the local state might not have been persisted to disk.

Solution

We implement a Safety Start Delay mechanism in the start command.

  • It reads disable_os_sync and safety_start_delay directly from the config (via Viper) to maintain loose coupling with CometBFT versions.
  • If disable_os_sync is enabled, the node sleeps for safety_start_delay (default 6s) before initializing the app.
  • This delay ensures that the network has likely produced a new block, mitigating the risk of double signing due to local data loss.

Related Issues

Backward Compatibility

  • No Impact on Default Behavior: If disable_os_sync is false (default), the delay logic is skipped entirely.
  • Safe Decoupling: This PR does not import new CometBFT packages or call new APIs directly. It relies on configuration values present in config.toml, ensuring compilation and runtime compatibility with older CometBFT versions.

How to Test

  1. Set disable_os_sync = true and safety_start_delay = "6s" in config.toml.
  2. Start the node (gaiad start).
  3. Observe a "Safety Start Delay active..." log message and a 5-second pause before the node starts.
  4. Set disable_os_sync = false.
  5. Start the node. It should start immediately without delay.

Integrated CometBFT's `DisableOSSync` and `SafetyStartDelay` config
options into the node start command.
- If `disable_os_sync` is enabled, the node disables O_SYNC for
better I/O performance.
- When O_SYNC is disabled, the node waits for `safety_start_delay`
before starting to prevent double signing due to potential data loss (amnesia).

Signed-off-by: Hwangjae Lee <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant