Continuous Performance Profiling

Computational complexity requirements

The difference between exponential, linear, logarithmic and constant execution time of a program is critical for various use cases. Even if an algorithm is purposely designed to satisfy a certain complexity class, there are multiple reasons why it might not. An underlying library, OS or even hardware can be the root cause of a performance problem.

Performance profiling is a part of software development since the beginning. It is essential for optimizing and fixing program’s time and space complexity as well as any bottlenecks caused by third party dependencies. A performance issue without an execution profile is like an error without a stack trace. It will lead to a lot of manual work to get to the root cause.

Call graphs

The profilers’ output is usually structured as some sort of call graph depending on the type of profile. For CPU profile it would be a call graph, usually in a form of tree, consisting of stack frames of function calls as branches and the number of samples as values. Looking at such profile will immediately reveal a hot spot, i.e. a function call which was found (sampled) the most on CPU. Similarly for memory allocations, such profile will show how many bytes are allocated and not released by what function call.

Other types of sampling profilers will provide similar information about blocking calls, i.e. calls waiting for an event, e.g. mutex, or even asynchronous calls.

A CPU call graph may look like this:

Profiling cloud applications

The era of horizontally scalable, data-intensive cloud applications deployed on FaaS, PaaS, IaaS or bare metal introduces an even greater need for profilers, since the performance of a single instance of application running locally on developer’s machine does not correlate to a large-scale data center deployment any more. A different scale and use of production application, its data volume, traffic patterns or configuration will expose inefficiencies and issues in the code not detectable in development or testing environment.

Traditional application performance management and monitoring products tried to address cloud applications by monitoring and tracing certain business specific workloads and introducing on-demand and automatic remote profiling capabilities.

Continuous vs. on-demand performance profiling

The problem is that on-demand or automatic profiling only allows post factum analysis. It might be helpful in case of performance regression or a problem-driven optimization, but it doesn’t provide the basis for continuous performance improvements. Assuming the application is evolving, not addressing its performance continuously will result in gradual performance regression.

A continuous profiling is not triggered by any event or human. The idea is that it is “always” active on a small subset of application processes. In terms of profiling overhead, this leads to even lower total overhead.

The most obvious benefits of such approach are:

  • Constant access to various current and historical performance profiles for troubleshooting and optimization.
  • Ability to historically compare profiles and locate regression causes with line-of-code precision.
  • Locating infrastructure-wide hot code or libraries, fixing which would benefit all applications.
  • Availability of pre-crash profile history for post mortem analysis.
  • No risk of crashing the application by invoking an on-demand profiler against a suffering or failing application, which is ironically the main use case for on-demand profiling.

One perfect example of a large-scale continuous profiling system is the Google’s GWP (Google-Wide Profiling), which profiles almost every server and application at Google. Please refer to the GWP paper for the full details.

In turn, StackImpact enables continuous performance profiling for anyone, a developer, small business or a large enterprise. It currently supports Go, Node.js and Python applications with the ability to profile CPU usage, memory allocations, blocking and async calls, also providing contextual information, such as errors and multiple runtime metrics. Learn more.

Historically comparable CPU profiles from application over a selected period of time: