Its dawned on us that Besu now represents upward of ~$5B on Mainnet in stake !! Great job for everyone

This is a big responsibility, not just on Mainnet, but other chains that use Besu too. We are representing more and more value and reputation for a ton of organizations. The Consensys team is thinking on how we may be more careful in the release process (specifically regressions). As we approach the Cancun fork, is it OK to delay less-tested PRs on Main to avoid releasing any issues (especially in 24.1.2)? The problem is, users may not be able to safely downgrade after the fork without a manual intervention / new release. Is anything going into this next release that may be less tested? or a default that should be hidden behind a feature flag? Additionally - I wanted to crowd source ideation about regressions. We have had a handful of interventions on releases recently. Instead of just entirely slowing down releases, any ideas from maintainers? The Consensys team is scheduling a sprint to look at testing gaps and improvements that can be made in the codebase. If anyone would like to help in this, let me know! Thanks all!


  • integrate Hive testing into CI/CD pipeline.
  • We should measure code coverage and make it a condition for merging a PR
  • benchmark metrics - sync time, block import time etc and test PRs for regression
    • some of these metrics (sync time, peering) may require long-running tests so may not be practical to run on individual PRs - could be on a nightly cadence
  • Ensure test coverage for key features eg bonsai
  • tests that run on mainnet config aren't testing upcoming hard fork features - to avoid finding bugs at the last minute, change this to holesky/sepolia (or make it configurable?)
  • burn-in on existing (synced) nodes vs syncing from scratch
    • backward sync is triggered when you upgrade
    • it would be awesome to make the (Consensys internal) burn-in process less manual.
  • don't want to make the release process more painful

What will be the outcomes of this epic?

Phase 1 (Analysis)

Aiming to complete this by end of March.

List of recommendations / Prioritized list of gaps to be addressed

  • An Epic/workstream is created with actionable item based on the findings from the brainstorm
  • Would be nice to have an idea of the relative amount of work involved so we can compare cost to benefit.

Phase 2 (Solutions)

Start solving the problems identified. May need different resources at this point.

What is the problem we’re trying to solve?

Why has this come up now?

More money is at stake

Besu profile is rising

Besu gaining market share

Greater reputational risk

What would be catastrophic? 

  • Consensus issue 
  • Sync issue
  • Peering issue (eg Besu nodes lose 100% of peers and drop off the network) 
  • Chain fork
  • Node database corruption or significant downtime (quite impactful at scale)
  • Missing or empty block production
  • Anything with customer funds or keys (includes Web3signer)

Need to verify that our bug prioritization matrix aligns with this.

Bad (but not catastrophic)

Anything here could escalate to catastrophic if the scale was big enough.

  • Missed attestations
  • Regression in block processing time
  • Regression in sync times or performance
  • Bad UX (nodes are a pain to manage)
  • Substantial drop in performance
  • Block export/import broken


  • No P1 bugs released
  • No P2 bugs released
    • One P2 bug per quarter is tolerable (but our goal should be zero)
    • We have P2 bugs identified, in the backlog - how do we make sure we don’t release them
    • Ditto bugs that haven’t been prioritized (unknown - maybe they are P2 or even P1)
  • Robust processes
    • we know that PRs have sufficient automated tests and/or require evidence of manual tests where this makes sense
  • Don't want to make the release process more painful
    • But could make the release process more robust/comprehensive if doing so doesn’t add hassle/pain to the process?
  • Ensure we’re spending effort where it makes sense

Opportunities in our (current) process

  • Bug prioritization 
    • Triage matrix - review
  • PRs
    • Feature toggles - new code paths are disabled by default? 
      • Note this requires some extra effort to leverage DI (Dagger) 
  • Merge to main
  • Nightly
  • Pre-release
    • Enterprise-oriented testing - Incorporating Matt Whitehead's suggestions
    • And ensuring enterprise PRs don't leak bugs into main
    • Maybe use PEEPS as a reference but probably makes sense to start with a new repo - might be easier to start from Besu AT framework than the PEEPS one. The PEEPS framework runs everything in Docker so could be slow and the DSL is quite cumbersome to use as everything is very generic so it works with both Quorum, Besu and EthSigner.
      • Bring this on board later? Part of burn-in or a separate ad-hoc process?
    • A big part of the release process is regression testing - but it’s very manual
      • burn-in on existing (synced) nodes vs syncing from scratch
      • it would be awesome to make the (Consensys internal) burn-in process less manual. 
  • Post-release
    • Retrospective to fold new learnings back into our processes and automation.


Tickets in the epic

See Testing Gaps Epic for the stack ranked issues. (For those without zenhub licences, the epic is here in Github but it won't show all the detail.)

Covered elsewhere

  • Is there any coverage of different solidity versions?
    • No there isn’t. 
    • This would be a very narrow gap - besu EVM is thoroughly tested by reference tests. Anything falling into this gap would likely be a bug in solidity, and caught elsewhere.
  • Something like Pact contract tests for engine API for various CLs?? Again, might be covered by Hive.
    • Think this is covered by Hive tests
  • Devwatch / operational oversight / dogfooding
    • See DWIP separate effort for improving devwatch itself 
    • Is observability a quality issue?
    • QA / devops / holesky/sepolia validators
  • No labels

1 Comment

  1. It would be great to have a test suite to detect performance regression of critical features. I think at the moment some features rely on extracting and comparing these metrics "manually". E.g A new PR needs to go through this suite to ensure it hasn't degraded the block import performance.