You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

Moving from CircleCI to GitHub Actions (GHA)

TBD:  HL to provide background on the motive and strategy

Self hosted Runners

Current CircleCI runs on 6 hosts, each with non-trivial specs. Using the default runners provided by GitHub actions will not be viable.  Current experiment is using 4core16G hosts in AWS, half the power of what CircleCI uses. Runners were pretty easy to setup, I did not automate the host setup but took notes on what was required:

  • Started with a typical Besu canary configuration. 4 cores, 16G of ram, an NVME root partition of 25G and an added /data partition of 500G of SSD.  SSD likely not necessary in production. 6 of these were setup.
  • Added another host just like the above, but using a 4 core graviton cpu to allow for ARM based builds.  
  • Github actions may be implemented as docker containers, so need to make sure the user running the runner is a member of the docker group.
  • Frequently ran into docker related space issues, so docker was migrated onto the /data partition instead of using the root partition for everything.
  • Docker cleanup scripts were added to be run prior to each job, though probably not necessary after moving dockers home.
  • Installing the agent software was easy and well documented, it was installed as a unit controllable via systemd.

An example of an overall runtime can be seen here, it takes a little more than an hour as is. There is a lot of room for reducing that runtime:

  • Move to 8 core machines, for parity with CircleCI runners.
  • Parallelize unit testing the same way as acceptance, integration and reference tests.
  • Reconsider how often long acceptance tests need to be run.

Suggest further investigating the use of on-demand compute via something like this: https://github.com/marketplace/actions/on-demand-self-hosted-aws-ec2-runner-for-github-actions

GHA Impressions

Porting over the functionality from CircleCI was tedious, but fairly straightforward. Current progress can be seen on this pull request. The design of small components combined with access to the runners via self-hosting, made the development process pretty accessible. Being able to tie into the GitHub event stream allows for improved and simplified release processes that can be adopted incrementally.

Secrets management was simple and straightforward. Secrets are defined in the repo settings, and are injected for use by the action definition. Logging output was properly obfuscated whenever a secret was used.  

Minimal changes were required to Gradle based build tasks, though the delegation to gradle for publishing and tagging complicates things.  Suggest removing those concerns from gradle. It makes sense for gradle to know how to produce artifacts like tarballs and docker images, but release related metadata should be managed externally.

Challenges

File Permissions: Things get pretty convoluted when using any docker based action. Since they have no notion of the UID to run as, their output ends up being owned by root. This action was used to work around the problem.  

Test Splitting: A large portion of the time spent was to re-implement test splitting so many hosts can all run the tests in parallel. This is a feature supported by CircleCI, but the only GHA support for this function to be found is still in alpha. We are currently only dividing the number of tests evenly across runners, however there is another means of test splitting that warrants investigation: one based on the test timings available via a previous runs junit results.

Cost: While an in-depth cost analysis has not been done yet, suspect it will be a wash or possibly more expensive.

Planned CI/CD Design

Subdivide CI process into reusable phases, and defer long-running and expensive phases till as late and infrequently as possible, while still ensuring maximum potential quality checks.

On new PR open, draft or otherwise, the following phases can all be run, mostly in parallel and mostly on cheap, GitHub provided runners.

  • compile all code
    • then validate javadocs- this depends on bytecode output.
  • check for repo compliance via repolinter
  • check for source code formatting via spotless
  • check gradle tooling validation.

Devs are expected to have run as many tests as feasible before review. CI checks are redundant.

Once the PR is approved, the following can be run in parallel.

  • run unitTests
  • run acceptanceTests
  • run referenceTests

Test results can then be posted back to the PR, and any failures would prevent merging.



  • No labels