Automatic Tracing of Slow HTTP Transactions in Go

When HTTP requests served by a Go application are slow or we just want to proactively optimize them for better customer experience, the first step is to understand which parts of the request/response cycle are actually slow. This means we need to find out whether the request is waiting for a database query, loading of a file, a system call, a mutex or may be some other goroutine stuck on channel send or receive.

One way to get this information is to trace each part of the request by manually adding some timing mechanisms. However, there are a few problems with it:

  • The tracing code added at multiple places will decrease readability and maintainability of the code.
  • Adding tracing code may sometimes require application logic changes.
  • In production environments, where hundreds of requests per second may be processed, logging all request’ timing data may not be a good idea.
  • It might not be possible to time all parts of HTTP request/response cycle.

The idea behind StackImpact’s automatic HTTP tracing is to avoid those problems by natively using Go’s tracing functionality and automatically reporting the profiles of slow transaction segments. Trace profiles are reported regularly and in case of anomalies in application execution. Let’s see how it works for the following server example.

This simple Golang application is an HTTP server, which has only one handler. The handler in turn calls some other HTTP endpoint, receives the data, and responds with a message to a client. To keep the example simple any supporting code such as proper error handling, timeouts, cleaning up was not included.

package main

import (
	"fmt"
	"net/http"

	"github.com/stackimpact/stackimpact-go"
)

func handler(w http.ResponseWriter, r *http.Request) {
	res, err := http.Get("https://jsonplaceholder.typicode.com/posts")
	if err == nil {
		defer res.Body.Close()
	}

	fmt.Fprintf(w, "Loaded some data from some API!")
}

func main() {
	// StackImpact initialization
	agent := stackimpact.NewAgent()
	agent.Start(stackimpact.Options{
		AgentKey: "agent key here",
		AppName: "Some Go Server",
	})

	// Start server
	http.HandleFunc("/", handler)
	http.ListenAndServe(":8080", nil)
}

The example also contains StackImpact agent initialization. This is needed to initiate the profiling. The agent key can be obtained by signing up for a free account.

After generating constant load against the application for some time, here is what we see in the StackImpact’s Dashboard.

http-trace

The profiles of slow HTTP transactions are regularly reported to the Dashboard and are available in the Bottlenecks section. A profile has information with line-of-code precision about where goroutines were waiting on different events, which is exactly what we needed. In this case the slowest part represents the http.Get call in app.go file, which is at line 12.

And if the performance is not bound by waiting for an event, but rather a CPU usage, the Hot spot section in the Dashboard contains process-wide CPU usage profiles of the application. These will point to the line of code which is causing high CPU usage.

See documentation for StackImpact agent setup instructions.

Follow us on twitter to learn more about Go performance profiling and monitoring.