End to End: Testing Go Services

Published in

Udacity Eng & Data

9 min readAug 9, 2022

Source: https://github.com/egonelbre/gophers

Here at Udacity we primarily use Go for our back-end microservices. Small binaries, performant parallelism, static typing, and a consistent readability across projects are just a few of the most cited reasons why so many organizations are making the same call.

As tools evolve and best practices change it can be overwhelming to configure test automation for new projects when existing examples span a variety of approaches (especially when they are stable maintenance-mode services that are only updated to manage dependency versions). Engineers are tempted to cargo cult existing files from these well-behaved projects with only the most tentative modifications, often missing opportunities to establish updated practices in new projects.

This post aims to provide a fully working up-to-date example for test conventions and automation, leveraging many of the tools we currently use at Udacity.

Who is this for?

Whether you’re a Go newbie looking for a place to start or a Go veteran looking for testing patterns and implementations, this post is for you!

(And while there are absolutely ways this walk-through can help configure non-Go projects, much of this is focused on language specific implementations.)

What tools are used?

Outside of Go 1.18, this example utilizes:

GitHub as our repository host
make for consistent entrypoints
docker and docker compose for consistency between dev/test environments
Redis as an example dependency
CircleCI for test execution
k6 for performance testing

How should I use this?

While some codeblocks appear in this post, the commit history of the example service provides a fully-functioning example that iteratively implements the pieces of this automation puzzle. Links to commits, files, and lines of code are scattered through this post to provide quick ways to drill down on the actual changes. Checking out the repository and running the make commands is a great way to get a feel for the workflow.

Before we dig into the tests themselves, we need something worth testing…

Our Service: a Simple RESTful Redis Wrapper

The service implementation itself is a few steps beyond hello world:

Our main server object is initialized with configuration from the environment and wraps request handlers with logging and authentication (Mat Ryer has an excellent deep dive on the handler wrapping technique).
Three non-status routes define our API: GET /{key} to read, POST /{key}/{value} to write, and DELETE /{key} to remove.
A multi-stage Dockerfile keeps the builds consistent and deployment images slim.
A docker-compose.yml definition orchestrates running our service with a Redis container for local execution.
The omnipresent Makefile is where our developer (and future CI) commands are defined, currently just consisting of deps , clean , and run.

Seeing it in action

We can start the service with a quick make run , which will make some docker compose noise and start logging to stdout:

In another shell we can make requests to localhost:8080 to start modifying our cache:

Checking back on the server, we can see corresponding logs:

Working as intended! Or at least appears to be, in the absence of testing. Now that we have a solid understanding on how our service operates, let’s introduce our first tests…

Static Analysis (Linting)

Our next commit adds golangci-lint to our project, which wraps a long list of specific linters covering a broad range of concerns. We enable all the supported linters by default to catch as much as possible, only disabling specific linters that are deprecated or a bad fit for our particular project scope.

How we actually build and run golangci-lint is worth explicitly noting — by specifying the linter itself as an unused import in tools.go, we can use go mod to pin the version consistently between multiple developers and testing automation! Some clever use of our Makefile ensures that the binary is built on-demand to our .bin directory, caching the version-specific binary for any sequential runs.

But is linting really testing?

Maybe? Who cares! Semantics aside, linting is a fantastic low-cost way to catch errors and ensure best practices well before any code is actually executed. Even for my side projects that are embarrassingly light on formal tests I am always running linters to ensure there’s another set of (virtual) eyes checking for mistakes and improvements. Writing your own static analysis tooling is always an option if the existing tools aren’t meeting your needs.

Mocking and Unit Tests

Before we dive into implementation details, let’s review the basics:

What puts the ‘unit’ in unit test?

A ‘unit’ of code is a subjective layer of abstraction that is dependent on any project’s structure and scope, but typically this style of testing is focused on isolated tests that validate behavior for the smallest units of code possible. A single stateless function can be run with a variety of inputs to assert expected outputs, or a client object could execute c.Init() --> c.Do() --> c.Close()to assert that it is constructed correctly and cleans itself up.

A good rule of thumb is to ask:

What changes outside this function would require the function’s unit tests to be updated?

Keeping that list as small as possible is key to simple, useful, and maintainable unit tests. Some things are destined to stay on that list…

Mocks

All those neatly-separated units of code often need to interact with each other to function correctly, but don’t necessarily want to fully invoke each other in their own tests. We can generate faux/fake/mock objects that act as a stand-in for those other units, responding as we intend and asserting that they’re invoked. Each language has their own patterns and libraries to generate mocks, but in Go mocks are derived from a specific interface that defines the object’s methods.

Putting them together

Generating the mocks comes first. We can use the same tools.go pattern we used for our linter to add a version-pinned mockery to our project, resulting in a new mocks/RedisWrapper.go file for use in our tests.

The next commit includes the actual unit tests for our endpoints handlers. We utilize table-driven tests to reduce the overhead of adding additional test cases, even running them in parallel (both for each unit test and each test-case per unit test) to minimize testing time. Other than this Go-specific gotcha, parallelization is an easy add to any table-driven test.

The only other part we’re adding at this stage is the gotestsum tool via our tools.go version-pinning pattern. This isn’t required, but will help us generate JUnit-style reports for integrating with test result analytics later.

Automating with CircleCI

At Udacity we primarily use CircleCI as our CI/CD orchestrator. Our next commit includes the initial .circleci/config.yml for running our linter and unit tests, the results of which are reported back to GitHub in the form of commit statuses. We can configure our branch protection rules to require these statuses to pass before merging in any future PRs to mitigate the risk of accidentally deploying any breaking changes.

At this stage we are also adding integration with Codecov to see how much code coverage our tests are currently providing. The usefulness of code coverage as a metric is hotly debated, but generating and collecting the metric itself is fairly straightforward by adding the -cover flag to our go test invocation. Since our tests are parallelized we’re also adding the -covermode atomic option, which ensures parallel correctness at the cost of some additional test-time overhead.

Integration Tests

In contrast to unit tests, our integration tests explicitly look to validate the interactions between modular systems. For our service this means running an actual Redis instance that our test code connects to. We already create a Redis container whenever we make run , so we can start the specific redis service in our docker-compose.yml, initiate our tests, and extract the coverage.txt report that reflects both our integration and unit test coverage combined.

Our unit tests covered just the handler functions themselves, so for our integration test we’ll use the standard net/http/httptest library to run our test cases through our router as well as our handler wrappers.

We can utilize go test -short to only run unit tests by adding a short circuit at the beginning of our integration test functions:

An Aside: White-box vs. Black-box

We have the option of defining tests in main_test instead of the main package, which is a good way to enforce a black-box testing paradigm by only allowing tests to interact with exported content. While this is a prescient concern for library code (where explicit control of exports is critical to how it is used), our example is written to be compiled and deployed, not imported. Larger services that utilize multiple sub-packages and client libraries benefit from the *_test package naming pattern, but our example is content with throwing everything in main.

Fuzzing

The release of Go 1.18 introduced fuzzing, a type of automated testing which continuously manipulates inputs to a program to find bugs. In practice running go test -fuzz will churn through generated inputs and save those that cause the test to fail, using those test cases when running without the -fuzz flag. Depending on your CI budget you could schedule regular jobs to find these test cases and open PRs/issues to resolve them, but for this project we’ll leave it as a manual process to run locally.

Converting our integration test to a fuzz test was simple:

func TestIntegration(t *testing.T) became func FuzzIntegration(f *testing.F)
Instead of iterating through a slice of test cases, add the initial seed corpus via f.Add(...)
Run each test case with f.Fuzz() instead of t.Run()

Fuzzing the test for a while on my local machine generated failing test cases that validated the limitations of using path parameters as input fields.

Performance

How well can be just as important as how, especially in a microservice environment where seemingly innocent latency increases can have outsized impacts on the overall experience. k6 is a great tool for throwing large quantities of requests at services, providing sequential response validation alongside detailed latency statistics:

Adding performance tests to our project used a slightly different pattern from when we initialized our Redis container before executing integration tests. Because we are piping to stdin to pass our test spec into the tool, we’re running a one-off container in the same network as our docker-compose.yml services. Alternatively we could build our own FROM loadimpact/k6 test runner image that includes the repository files or run our test on a VM (versus a container) to allow us to use docker volume mounting, but the existing implementation is sufficient for executing a single load test file.

Pass/fail thresholds for any of these metrics can be codified into your test to ensure that the service is meeting the relevant SLOs.

What about scaling?

While it is definitely possible to use docker compose to load test against multiple instances of the service, it can be difficult to translate metrics from an artificial load balancing setup to your actual deployment configuration. Pointing your k6 tests at a deployed non-production environment can provide more relevant results that can be used to directly inform production configurations.

Leveraging CircleCI

Regression tests

CircleCI matrix jobs are an excellent way to test against multiple versions of dependencies. Here we are running our integration test with a variety of Redis, Go, and Alpine versions in parallel — be warned that this is a quick way to blow through a CI budget!

Uploading test artifacts

Our next commit enables storing test data in CircleCI, providing a consistent place to view and share test results as well as detect flaky tests over time. With our decision to utilize gotestsum this requires only a simple --junitfile modification to our test invocation. Here we’re also storing the raw stats.json output from k6, but there are several supported output formats that can be used to integrate with your favorite metrics platforms such as Prometheus and Datadog. When flaky performance results appear it can be a huge time saver to have the stats on hand so you don’t have to spend time recreating the precise test conditions.

And more?

This microservice has a sufficiently reasonable scope of tests to run in a production environment:

linters to catch subtle mistakes and address niche concerns/vulnerabilities
unit tests to validate implementations
integration tests to check dependency compatibility (with fuzzing to catch unintuitive corner cases)
performance tests to catch unintended bottlenecks and ensure SLO compliance
regression tests to compare various dependency versions

Once this is deployed and serving live traffic we could start testing in production, or leveraging eBFP to optimize at our kernel level, or…

Too much testing?

With testing, it’s important to take a step back and ask:

Do our tests give us the confidence we need to use this?

If the answer is “yes”, then we’re done! Tests can provide significant value to the stability of a project, but spending all of your time perfecting tests can be a tempting productivity sink that provides diminishing returns on impact. There will always be bugs, and blocking releases to obsessively scour for them can bring you and your team’s productivity to a halt.