Building APIs with Go — Part 3 Instrumentation and Error Handling

6 min readMay 29, 2021

We’ve coded a simple Todo API in the last part: Building APIs with Go — Part 2 Writing our first endpoints | by Fernando H. Bandeira | May, 2021 | Medium

Now we’ll add logging, APM, panic recovery, graceful shutdowns, and error handling to our API.

The code of this part is available on Github: fernandobandeira/go-api at part-3 (github.com)

First, let’s start with our logger and telemetry:

Create logger and telemetry agent (main.go)

We are using zerolog as our structured logging package. Another good package for this would be the uber zap package. The idea is that logging will follow a structure that can be serialized to JSON and processed by something like the Elastic Stack, where we can search and generate alerts.

The opentelemetry package is handling tracking and metrics. This is a recent standard that combines the opentracing and openmetrics projects on a single package, still in beta during the writing of this article but should be generally available soon.

You can plug multiple exporters like Prometheus for metrics, Jaeger for APM, etc. Here we’re using just Jaeger for the tracing data. The sampling rate is dictated by the headers sent to the request (this is useful when you have multiple services, this form of sampling would make sure that all the services called during a request would record the trace). If there are no headers, then it’ll always record the trace by default.

Now we’ll instrument our Chi router with some middlewares.

Real IP Middleware

This middleware will use any values provided in the X-Forwarded-For HTTP header (if it exists) as the IP instead of the RemoteAddr that made the requests.

Important: only use this if you can trust this header since any user can provide it and mask their IP, so you shouldn't use this middleware unless you run your app behind something like a reverse proxy or a load balancer.

Request ID Middleware

This middleware generates a UUID (if none exists in the request headers) and adds it to the request context and the response headers. This is useful for logging and correlating errors between frontend apps and backend apps.

Tracer Middleware

So far, there’s no official middleware from the opentelemetry package supporting the Chi router, so I’ve created this simple wrapper using the standard net/http middleware available in the package.

This will start a new tracing span every time a request is made to the API, following the sampling rules that we specified in the initialization.

Logger Middleware

Zerolog console output

By default, you can’t get the status of a response from the standard go response writer, so here we’re wrapping it with a special Chi implementation that records the status for us to log in before sending the response.

We’re logging the request-id and the current trace id, and the remoteAddr (modified by the realip middleware) this way. We can get more info about the requests later. For example, if there’s a specific IP making a ton of requests, we could alert and block this user or check if there’s an endpoint with an increase in error rates or latency.

Recover Middleware

Zerolog console output (test panic thrown inside our todo handler)

In this middleware, we’re catching any panics that occur and logging them with a Panic level. We’re also adding the trace id, request-id, and a formatted stack trace to have more info to identify the root of the problem. After that, we respond to the client with an Internal Server Error status.

We don’t want to add the error to the response since we don’t know what info it may have. It could include a SQL query or some DB credentials. For instance, we should try to return as little information as possible to the client since we are logging it anyways.

The opentelemetry package automatically handles panics, so we don’t need to worry about it in our APM tracing.

Now let’s take a look at the tracing.

We’re using these utility functions to make the process of creating spans a little bit less repetitive, this way, we can copy and paste the below blocks on the top of each of our function definitions, and it should be good to go.

In the previous part, we passed the ctx as the first parameter to all of our functions. With that, adding tracing should be a smooth process.

Let’s check what we have so far:

We’re off to a good start already. We can see how much time each function took to return a response and how long it took to respond to the client.

Finally, we can now take care of the error handling.

We start by creating a custom error entity. This entity will have a custom Code field. This field is tied to the HTTP status codes, but you can create your own internal code if you have a bigger app.

The idea is that when an error happens for the first time, we’ll wrap it with this entity and return it. Subsequent error checks should return the error upwards till the handlers present it.

Personally, I prefer to handle the error as soon as it happens and return it afterward, but you can also handle everything inside your service instead if that makes more sense to you.

Our handlers have a special presenter for errors. This presenter will format the response appropriately to our client and also log the occurrence. Let’s check it out.

We never return any info about the error in our response. This reduces the likelihood of exposing sensitive information about our users' credentials / inner workings. We respond with a simple message and a status code. Everything else is being logged, and we can check it later in our logging mechanism.

Zerolog error console output

With this, we have achieved a good level of observability on our API. We can add more logs to specific funcs like a warn level log every time an authentication attempt fails. This is something that we’ll think about as the app evolves.

Let’s end this with a graceful shutdown.

This will listen for incoming exit signals and shut down our server. The idea is that the server will finish up and respond to any pending requests / close connections before stopping the execution.

I’ve intentionally left some code changes out of this article since it was a bit long already, but you can check everything on Github: fernandobandeira/go-api at part-3 (github.com)

In the next part, we will be looking into environment configurations and unit tests, and e2e tests for what we have done so far. Thanks for reading.