Detecting Lock Contention in Go

Mutexes are often a source of contention, resulting in performance issues or deadlocks. This is no different in Go. In some cases, e.g. obvious deadlock situations where all goroutines are waiting, the runtime may be able to detect/predict mutex-related issues and panic. Generally, the problems will manifest themselves at the application logic level.

Let’s look at this simple example.

package main

import (
	"fmt"
	"sync"
	"time"
)

func main() {
	lock := &sync.Mutex{}

	// goroutine1
	go func() {
		lock.Lock()

		// here we make other goroutine1 wait
		time.Sleep(500 * time.Millisecond)

		fmt.Printf("%v: goroutine1 releasing...\n", time.Now().UnixNano())
		lock.Unlock()
	}()

	// goroutine2
	go func() {
		fmt.Printf("%v: goroutine2 acquiring...\n", time.Now().UnixNano())
		lock.Lock()
		fmt.Printf("%v: goroutine2 done\n", time.Now().UnixNano())
	}()

	time.Sleep(1 * time.Second)
}

The lock is obtained in the first goroutine, and the second goroutine has to wait for it.

Problems like this will most likely not be detected in the development phase, when there is no concurrent use of an application, and will result in a performance issue only in the production environment. As a side note, it is always a good idea to have automated performance regression testing in place, which will simulate concurrent live traffic.

Go has a built-in block profiling and tracing toolset for such situations: pprof. Basically, an application has to expose the profilers on an HTTP port by importing the net/http/pprof package. Afterwards, different profiles can be requested by running go tool pprof http://localhost:6060/debug/pprof/block.

While pprof’s block profiler or tracer can be extremely helpful in identifying contention issues, there are a few obstacles in using pprof against production environment:

  • The profiler’s HTTP handler, which accepts profiling requests, needs to attach itself to the application’s HTTP server (or have one running), which means extra security measures should be taken to protect the listening port.
  • Locating and accessing the application node’s host to run the go tool pprof against may be tricky in container environments such as Kubernetes.
  • If the application has a deadlock or is unable to respond to pprof requests, no profiling or tracing is possible. Profiles recorded before the problem was detected would be very helpful in cases like this.

For production environments, StackImpact provides automatic lock contention detection profiling. It reports regular and anomaly-triggered profiles to the Dashboard. This is what we see for our sample program (with an added StackImpact agent) in the Dashboard.

lock-contention

Getting started with StackImpact

Sign up for a free account, get the agent with go get github.com/stackimpact/stackimpact-go, import stackimpact and add these two lines to your application:

agent := stackimpact.Start(stackimpact.Options{
	AgentKey: "agent key here",
	AppName: "MyGoApp",
})

See documentation for detailed setup instructions.

Follow us on Twitter @stackimpact for updates on performance profiling and monitoring of production Golang apps.