Memory Leak Detection in Production Go Applications

Memory leaks are very common in almost any language, including garbage collected languages. Go is not an exception. A reference to an object, if not properly managed, may be left assigned even if unused. This usually happens on an application logic level, but can also be an issue inside of a imported package.

Unfortunately it is very hard to detect and fix memory leaks in development or staging environments, firstly because the production environment has different and more complex behaviour, and secondly because many memory leaks take hours or even days to manifest themselves.

What is needed to find memory leaks in production

Golang has a very powerful profiling toolset pprof, which includes heap allocation profiler. The heap profiler gives you the size of the allocated heap and number of objects per stack trace, i.e. the source code location where the memory was allocated. This is critical information, but is not sufficient as a single profile. To detect if there is an actual leak over a period of time, regular allocation profiles need to be recorded and compared.

There are issues when using pprof against production environments:

  • The profiler’s HTTP handler, which accepts profiling requests, needs to attach itself to application’s HTTP server (or have one running), which means extra security measures should be taken to protect the listening port.
  • Locating and accessing the application node’s host to run the go tool pprof against may be tricky in container environments such as Kubernetes.
  • If application has crashed or is unable to respond to pprof request, no profiling is possible.
  • To have the historical, per stack trace view of heap allocations, a regular manual pprof execution, interactive result analysis and comparison is needed.

Using StackImpact for automatic memory leak detection and profiling

StackImpact completely automates collection of heap allocation profiles, solving all of the above mentioned issues. The StackImpact agent, which is initialzed in the application records and reports regular and anomaly-triggered allocation profiles to the dashboard. Here is how to add the agent to the application:

Get the agent key at stackimpact.com.

go get github.com/stackimpact/stackimpact-go
agent := stackimpact.NewAgent();
agent.Start(stackimpact.Options{
	AgentKey: "agent key here",
	AppName: "MyGoApp",
})

See documentation for detailed setup instructions.

After restarting/deploying the application, the profiles will be available in the dashboard in a historically comparable form.

memory-leak

Similar profile history is automatically available for:

  • CPU usage
  • Channel waits
  • Network waits
  • System call waits
  • Locking

Metrics from Go runtime are also available in the dashboard.

Follow us on Twitter @stackimpact for updates on performance profiling and monitoring of production Golang apps.