Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

What this means for developers

  • Deferral of advanced CI/CD testing. As a cost reduction measure (CircleCI for Besu costs a little under $2K USD per month), long running tests are only executed after PR peer reviewexecuted in parallel on low-spec, free action runners. Details on how to run those tests locally are now included in the pull request template, and they can also be manually initiated on the CI/CD infrastructure.
  • Releases are now much more fully automated. Any maintainer may create a release from main  or a release-*  named branch (hereafter referred to as "releasable branches") using standard Github tools.  That process is outlined in more detail below.
  • Speed. Breaking up the various workflows into smaller pieces using test splitting results in much greater parallelization, and hence shorter overall runtimes.

...

It takes about 15 minutes, end to end, but if we can be smarter about splitting the unit tests we could easily cut that in half. 

Once the PR is approved by any Besu Maintainer, the workflow executes the At the same time, other workflows execute the more expensive test suites before allowing a merge to a releasable branch.

...

There are a couple of common patterns worth explaining in this pipeline. We regularly need to determine if a long-running test suite should run, how to split up tests, and how to consolidate results.

  • at this time Github Actions does not have an event for a PR being approved for the first time. That means we need to make sure we do not run tests for each approval a PR receives.  To do this, we directly interact with the Github REST API, and check if the comment placed on the PR is an approval, and that the tests have not already run successfully.  See the shouldRun job defined the acceptance-tests.yml as an example.
  • test splitting is handled by a github action which intends to group them by the most even runtime based on past runs. In the case of the EVM reference tests, the sheer volume seemed to overwhelm it. Those tests are split up using a simple bash script.
  • testing workflows that use a matrix strategy for parallelization, will also have a consolidating job that waits for them all to complete, before marking the workflow run as passed.

...

Example release can be seen here.

Emergency Takedown Process

On the discovery of a known bad release, build artifacts can be removed from circulation.

  • Do not delete the release in github, rather update it to explain it was found to be faulty.
  • Be sure to include what alternate version(s) to use instead.
  • When editing the release, delete the attached build artifacts.
  • Do not remove the file hashes from the release notes, rather mark them up as strikethrough so they are still available, but discouraged.
  • Delete the docker images from the package management screens, for all image variants.
  • Communicate on social media if necessary.

This process was tested during initial implementation of this CI/CD pipeline, and an example can be found here.

Developer Notes

In addition to the rulset defined above, there is another important repository setting that needs to be actively maintained: Actions Permissions. When a new github action is to be used, or an existing one updated, it must be referenced by the specific git sha for that release. This prevents any tags that may be moved on the action distribution from causing a change in what actions are run.

...