Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Stability

Some solution solutions already implemented 

  • Sync stalling 
    • Rules about what is considered our best peer post-merge is not implemented
    • Trying the same peer over and over again (shuffling the peers)
  • Invalid block errors 
    • Consensus layer is on the fork with bad data
    • Besu has a storage exception to report the invalid block to the consensus layer, which sets us off to a wrong fork (potential fix by Justin; GH issue)
    • Potentially another Besu internal error could cause invalid blocks of which we don’t know about yet 

...

  • Worldstate root mismatch
    • Bonsai and snapshots
    • Solution: ??Confirmed working for many cases, needs more testing and handling of any corner cases

Issues around peering

  • Need restart to find new peers sometimes during sync
    • Potentially because of the lack of evaluating of peers during sync and post sync 
    • Solution: ??
  • Losing many peers 
    • Because threads were blocked, we lost many peers 
    • Vertx for example uses different approaches to threading  
    • Solution: ??

...

  • Besu has problems with slow IO/disks → Besu is generating a lot of IO 
    • We are not using the flat DB during block processing, so have to gather a lot of data from disk 
    • Need caching in more areas - R/W caching 
    • Doing less work, persisting less to the disk, persisting trie logs but not the worldstate (Amez / Karim) 
    • The first hotspot in Besu is reading data from RocksDB using the RocksDB.get method. This is mainly caused by the fact that we have to get most of the WorldState nodes from the Patricia Merkle Trie
    • Need to identify more areas where IO contention is commonplace 

Trace Performance 

  • Poor performance of tracing of blocks / transactions
    • Not sure why we are slow 
    • Many times Besu would crash when tracing a full block 
    • OOM errors
    • Short timeout can cause issues 
    • Is db tuned for tracing?
    • 3786 
    • Will need good performance for any rollups use-cases
    • Solution?: Instead of replaying each time the traces for each user request why not saving this trace result in a separate database or a separate module instead of saving the block and the worldstate for each block


  • Solution?: Separate into a different microservice

...