Detecting and Locating Memory Leaks in Production Python Applications

A reference to an object, if not properly managed, may be left assigned even if unused. This usually happens on an application logic level, but can also be an issue inside of an imported package.

If memory leak appears in production environment, it is very hard to detect and fix it in development or staging environments, firstly because the production environment has different and more complex behavior, and secondly because many memory leaks take hours or even days to manifest themselves.

What is needed to detect and locate memory leaks in production

The growth of application memory footprint over time may indicate a memory leak. In the worse case, the kernel can kill the process, if it consumes too much memory. Monitoring and alerting on Python memory usage and garbage collection metrics is important to get notified on memory growth as soon as possible.

Once a memory leak is detected, the first thing we need to know in order to locate it is where in the code the memory is allocated and not collected by garbage collection.

Fortunately, Python 3 has a built-in allocation tracer, which StackImpact agent relies on to report continuous memory allocation statistics. The agent appropriately schedules the allocations tracer, watches it and ensures the lowest possible overhead. The result is a historical view of uncollected memory allocation rate per stack trace, as seen on the following screenshot.

Using these allocation profiles, sorted by stack frames with highest uncollected allocation rate, it is now much easier to go back to the source code and make sure that the objects created at those locations are properly managed.

Setting up StackImpact to monitor, identify and locate memory leaks

As shown, the StackImpact agent, which is initialized in the application, records and reports regular memory allocation profiles to the Dashboard. It also reports runtime metrics, including memory and garbage collection, which are critical for detecting the memory leak in the first place.

Here is how to add the agent (github) to the application:

Get the agent key at stackimpact.com.

pip install stackimpact

Start the agent in the main thread:

import stackimpact
…
agent = stackimpact.start(
    agent_key = 'agent key here',
    app_name = 'MyPythonApp')

See documentation for detailed setup instructions.

That’s it! After restarting/deploying the application, the profiles will be available in the Dashboard in a historically comparable form.

Similar profile history is automatically available for:

  • CPU usage
  • Blocking calls
  • HTTP handlers

In addition to that, exceptions and runtime metrics will also available in the Dashboard.