Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We're eager to share that since the merge in September 2022, Besu team improved Block processing performance by three times reaching a 95th percentile around 250 ms and 99th percentile around 500 ms on a solo staking hardware machine (AMD Ryzen 5 5600G, 32 GB DDR4 3200MHz, 2TB WD Black SN850 NVMe).

A TL;DR is provided towards the bottom with our recommended staking configuration.

Besu block processing performance Profile

To boost a client's performance, it's essential to collect metrics, evaluate the existing state, and then plan enhancements based on these insights. Extra data is also needed to identify which parts of the code consume the most time during the program's execution. Besu offers a plethora of metrics covering different facets of the execution layer, including block processing (newPayload calls), p2p, RPC, the transaction pool, syncing, the database layer (RocksDB), among others. These metrics can be gathered or extracted by any monitoring system like Prometheus and can be displayed using a visualization tool such as Grafana with the existing Besu Full dashboard

...

Since the merge in September 2022, we've been committed to improving Besu's block processing performance, following a significant number of user reports about missing attestations on their validators. We've succeeded in boosting the performance by three times, lowering the median time from 1.71 seconds to 0.57 49 seconds on the m6a.xlarge AWS VM, and the 95th percentile from 2.98 seconds to 0.98 81 seconds. It's important to note that the m6a.xlarge AWS instance comes with 4vCPU, 16 GiB, and lacks NVMe. Most of the improvements have been made specifically to the Bonsai data layer implementation. If you're using Besu with Forest and experiencing performance problems, we suggest switching to Bonsai.

Image RemovedImage Added

On better instances, such as the mid-spec VM on Azure Standard_D8as_v5 (8 vCPU, 32 GiB with a remote SSD disk), Besu's performance is even better. As shown in the screenshot below, both instances exhibit a median time of approximately 250 milliseconds and a 95th percentile around 410 milliseconds.

...

One of the biggest improvements on SLOAD operation was the implementation of the healing mechanism of the Bonsai flat database on accounts and storage. Bonsai is a feature in the Besu Ethereum client that operates with both a flat database and a trie structure simultaneously. This unique combination allows for faster and more efficient SLOAD operations.

...

Besu running without a complete flat database

Image Modified

Besu running with a complete flat database

Image Modified

Another improvement we introduced was turning off checksum verification during reads on RocksDB. This is because there are already different validations in place as per the Ethereum client specifications. To make this flag working, which it wasn't initially, the Besu team had to contribute to the RocksDB project throughthis PR.

We also focused on fine-tuning RocksDB by activating the bloom filters. When a read request comes in, RocksDB uses the Bloom filter to check if the key is present in an SST (Sorted String Table) file without having to scan the entire file, this helps to speed up read operations. Additionally, we introduced a new flag for high-spec machines (with RAM > 16 GiB) to utilize more block cache, thereby reducing IO activity during SLOAD. The high spec flag can be enabled with this Besu option --Xplugin-rocksdb-high-spec-enabled.

Besu has also benefited from the updates and releases of RocksDB, which have led to enhancements in performance.

SSTORE operation improvements

...

The second improvement involved caching empty slots in the Bonsai accumulator. This optimization positively impacted both SLOAD and SSTORE operations. However, the improvement was more noticeable in SSTORE operations, as some of the original and current values can be empty. This PR had a huge impact on SSTORE execution performance as we can see in the CPU profiling below

Before the optimization

Image Modified

After the optimization

Image Modified

General EVM improvements 

...

Below is a summary table run with Java 21 on a M1 Mac Pro. Java 21 offers a 10-20% boost just for using it. Java 17 runs are more flat after 22.10.0 but are also lower than Java 21 runs in all cases.  5 runs of each operation were executed and the median and max values are shown. The results against all operations and a few select problematic values are averaged together to provide the number in this graph.  There are two notable bumps on this graph, at 22.1.0 and 22.10.0, while the work after 22.10.0 have been focused on fixing some worst-case performance scenarios.

Image Modified


Native Types Transition

...

In our efforts to decrease memory usage in Besu, we initially compared various system memory allocators: Malloc (default), TcMalloc (Google), Mimalloc (Microsoft), and Jemalloc. Based on the metrics we gathered, we found that Jemalloc outperformed the others for Besu's workload, reducing Besu's memory usage by over 40%. This significant reduction is primarily attributed to superior memory defragmentation and enhanced multithreading. 

We also observed that setting MALLOC_ARENA_MAX to 1 or 2 significantly reduced the Resident memory usage. At present, when operating Besu with an official Docker image, it comes with jemalloc and MALLOC_ARENA_MAX set to 2. If you're running Besu natively, it will indicate whether it's using Jemalloc or if it's not installed. If Jemalloc is not installed, you will see a message in the Besu logs stating, 'jemalloc library not found, memory usage may be reduced by installing it'.

Image Modified

We also observed that the OpenJ9 JVM implementation exhibits a better memory footprint, primarily due to its garbage collection (GC) implementation (GenCon) which more frequently frees up memory.


Image Modified

Given this behavior with OpenJ9, we opted to adjust the Garbage Collector (G1GC) of the HotSpot JVM implementation to mimic this behavior without affecting performance. We introduced three flags -XX:G1ConcRefinementThreads=2 -XX:G1HeapWastePercent=15 -XX:MaxGCPauseMillis=100, which are now incorporated into the besu.sh script, and the outcome is similar to OpenJ9 JVM implementation.

Image Modified

With the default JVM installed (Hotspot), users simply need to check the logs to determine if Jemalloc is installed. If it's not, we recommend installing it to reduce Besu's memory footprint.

...

  • Enable the flat database healing with --Xsnapsync-synchronizer-flat-db-healing-enabled=true
  • Enable the high spec flag if your machine has more than 16 GiB RAM --Xplugin-rocksdb-high-spec-enabled
  • Install Jemalloc to replace the default system allocator (malloc). Besu will automatically detect if it is installed and Preloaded as the system memory allocator.

If you need to troubleshoot performance, check out our documentation here.

Here’re some other best practices :

...

  • 95th percentile around 250 ms
  • 99th percentile around 500 ms

Image Modified

Future work around performance

...