Skip to content

Conversation

@paulhauner
Copy link
Member

non-finality: Run 002-july-2025

Run Parameters

  • Simulation: non-finality
  • Start Time: July 2025
  • Clients: Mainnet-like distribution (EL: Geth 63%, Nethermind 22%, Besu 10%, Erigon 5%; CL: Prysm 42%, Lighthouse 28%, Teku 21%, Nimbus 9%)

Status Checklist

  • Run parameters defined
  • Infrastructure prepared
  • Run executed
  • Incidents documented
  • Final summary completed

Test Configuration

  • Hardware: 6-core AMD Ryzen boxes, 64GB RAM per node
  • Procedure: 2 days normal operation → 1 week with 2/3 validators offline → recovery monitoring
  • Monitoring: Manual via Grafana dashboards

Client Versions

  • Geth v1.15.11, Nethermind v1.29.2, Besu v25.3.0, Erigon v3.0.0-alpha5
  • Prysm v6.0.3, Lighthouse v7.0.1, Teku v25.4.1, Nimbus v25.4.1

Incidents

Issues tagged with incident:non-finality:002-july-2025

Resources

Grafana dashboard links TBC

Results Summary

To be completed when run finishes

🤖 Generated with Claude Code

Create third run for non-finality simulation with mainnet-like client distribution, 6-core Ryzen hardware, and detailed test procedure including 1-week validator shutdown.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@paulhauner paulhauner added type:run Run execution PRs run:non-finality:002-july-2025 Non-finality July 2025 run 002 labels Jul 28, 2025
@paulhauner
Copy link
Member Author

FYI, here's the prompt that produced this PR:

make another run, for july-2025.
this time we're testing a network with a client distribution that resembles mainnet (claude please try to guess these
percentages based on some web searches).
hardware is going to be 6-core Ryzen boxes for each node, with 64gb of ram each.
client versions are the latest (claude, go get the latest releases for all the EL and CL clients).
procedure will be to run the network for 2 days, then turn of 2/3rds of validators for a week, then turn them back on and see if the network can recover. monitoring will be manually via grafana (links tbc)

@paulhauner
Copy link
Member Author

New Incident

Issue #9: Lighthouse crashed after 100% CPU usage and high memory consumption

Lighthouse crashed after sustaining 100% CPU usage for 5 minutes and consuming 38GB of RAM during non-finality simulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run:non-finality:002-july-2025 Non-finality July 2025 run 002 type:run Run execution PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants