Reference

StackImpact product reference

Hot spot profiling

Profile recording and reporting by the agent

Each profiler report represents a series of profiles recorded by the agent continuously. Depending on the profiler and the associated overhead, the agent schedules the profile recording in for optimal results.

Historical profile grouping

Reports are shown for a selected application and time frame. A default view is Timeline, which will present profiles from multiple subsequent sources, e.g. machines or containers, in a single time sequence. If multiple sources report profiles simultaneously, e.g. the application is scaled to multiple machines or containers, not all profiles will be visible. It is possible to select a particular source only.

Additionally, a timeframe can be selected to filter recorded profiles.

Profile history

The profile chart shows a key measurement, e.g. total, max or rate, for each recorded profile over time. By clicking on the measurement point in the chart, a profile for the selected time point will be shown as an expandable call tree. Every call (stack frame) in the call tree shows its own share of the total measurement as well as the trend based on previous values of the call found in previous profiles.

Profile context

The profile context, which is displayed as a list of tags, reflects the application environment and the state at which the profile was recorded. The following entries are possible:

  • Host name - the host name of the host, instance or container the application is running on. The value is obtained from the system.
  • Runtime type - a language or platform, e.g. Go or Python
  • Runtime version - a version of the language or platform.
  • Application version - can be defined in the agent initialization statement by the developer.
  • Build ID - a prefix of an SHA1 value of the program.
  • Run ID - a unique ID for every (re)start of the application.
  • Agent version - version of the agent that recorded the profile.
CPU usage profile

CPU usage profiles are recorded by sampling profilers. In the Dashboard, it is represented by a call tree with nodes corresponding to function calls. Each node’s value represents a percentage of the absolute time a call was executing during recording of the profile. The percentage is a best effort to calculate absolute execution time. It is achieved by using the number of cores available to the process and the profiler’s sampling rate. Additionally, the number of samples for each call is provided.

There can be many root causes of high CPU usage. Some of them are:

  • Algorithm complexity, i.e. a code has a high time complexity. For example, it performs exponentially more steps relative to the data size the algorithm is processing.
  • Extensive garbage collection caused by too many objects being allocated and released
  • Infinite or tight loops

See also:

Memory allocation profile

Memory profiles are recorded by reading current heap allocation statistics. Each node in the memory allocation profile call tree represents a line of code where memory was allocated and not yet released after garbage collection. The value of the node is the number of bytes allocated by a function call or by some of its nested calls. If a node has multiple children nodes, the node’s value is the total of its children’s values. The number of samples, which is shown next to the allocated size, corresponds to the number of allocated objects. Some agents, e.g. Python, report allocation rate.

A single profile is not a good indicator of a memory leak, while memory can be released shortly after the memory allocation statistics were read. A better indication of a memory leak is a continuous increase of allocated memory at a single call node relative to its previous readings. Different types of memory leaks may manifest themselves at different timeframes.

Memory leaks can have different root causes. Some of them are:

  • The pointer to which an object is assigned after allocation stays unreleased, e.g. it has the wrong scope.
  • A pointer is assigned to another pointer that is not released, similarly to previous point.
  • Unintended allocation of memory, e.g. in a loop.

See also:

Time profile

Blocking or async call profiles represent a call tree, where each node is a function call that waits for an event. The value is an aggregated waiting time of a call during one second. It can be grater than one second, because the same call can wait for an event in parallel. Events can be network reads and writes, system calls, mutex waits, etc. The number of samples, which is shown next to the wait time, corresponds to the number of executions of the function calls sampled.

See also:

Bottleneck profiling

Bottleneck profiles are recorded, reported and represented identically to hot spot profiles, except that the values of function calls represent execution duration percentiles of averages. Depending on the profile, the duration can represent a blocking or asynchronous operation.

Latency profiles

Latency profiles represent execution times in terms of a single event, request or operation. Unlike hot spot profiles, where the measurements are aggregated over profiling duration, the latency profiles contain measurements aggregated in the scope of one event (e.g. HTTP request) or a single function call.

See also:

Error monitoring

The agent provides an API for reporting errors. When used, the Errors section will contain error profile reports for different types of errors. Each report is a collection of error profiles for a sequence of sources, e.g. hosts or containers, or a single source over a period of time, which is adjustable.

A chart shows the number of total errors of a particular type over time. Clicking on a point will select an error profile corresponding to the selected time.

An error profile is a call tree, where each node is an error stack frame. The value of a node indicates the number of times an error has occurred in a 60-second time period.

See also:

Health monitoring

The agents report various metrics related to application execution, runtime, operating system, etc. Measurements are taken every 60 seconds.

Go application metrics
  • CPU
    • CPU usage - percentage of total time the CPU was busy. This is a best effort to calculate absolute CPU usage based on the number of cores available to the process.
    • CPU time - similar to CPU usage, but not converted to percentage.
  • Memory
    • Allocated memory - total number of bytes allocated and not garbage collected.
    • Mallocs - number of malloc operations per measurement interval.
    • Frees - number of free operations per measurement interval.
    • Lookups - number of pointer lookup operations per measurement interval.
    • Heap objects - number of heap objects.
    • Heap non-idle - heap space in bytes currently used.
    • Heap idle - heap space in bytes currently unused.
    • VM Size - virtual memory size
    • Current RSS - resident set size that is the portion of process memory held in RAM.
    • Max RSS - peak resident set size during application execution.
  • Garbage collection
    • Number of GCs - number of garbage collection cycles per measurement interval.
    • GC CPU fraction - fraction of CPU used by garbage collection.
    • GC total pause - amount of time garbage collection took during measurement interval.
  • Runtime
    • Number of goroutines - number of currently running goroutines.
    • Number of cgo calls - number of cgo calls made per measurement interval.
Node.js application metrics
  • CPU
    • CPU usage - a percentage of total time the CPU was busy.
  • Memory
    • Total heap size - total heap size, including idle size.
    • Used heap size - a set of metrics representing heap size as well as heap space sizes for code space, new space, old space, map space and large objects.
    • C++ objects - memory usage by C++ objects bound to Javascript objects.
    • RSS - resident set size that is the portion of process memory held in RAM.
  • Garbage collection
    • GC cycles - number of garbage collection cycles.
    • GC time - time spent performing garbage collection.
  • Runtime
    • Event loop I/O stage - time spent in event loop I/O stage.
    • Event loop ticks - number of event loop ticks.
Python application metrics
  • CPU
    • CPU usage - percentage of total time the CPU was busy. This is a best effort to calculate absolute CPU usage based on the number of cores available to the process.
    • CPU time - similar to CPU usage, but not converted to percentage.
  • Memory
    • VM Size - virtual memory size
    • Current RSS - resident set size that is a portion of process memory held in RAM.
    • Max RSS - peak resident set size during application execution.
  • Garbage collection
    • Collected objects - number of collected objects by garbage collected (Python 3).
    • Uncollected objects - number of objects, which are not yet collected.
    • Uncollectable objects - number of objects, which cannot be collected (Python 3).
    • Collections - number of garbage collection cycles (Python 3).
  • Runtime
    • Active threads - number of active threads

See also:

Application footprint

Footprint section gives a cross-application view of the resource consumption with breakdown by application, allowing to see which application, including all of its processes, consumes how much CPU and memory over time.

The CPU footprint is calculated by multiplying the number of application processes by the percentage of a single process CPU usage of the total infrastructure.

The memory footprint is calculated by multiplying the number of application processes by the process RSS.

See also:

Anomaly detection

StackImpact continuously observes the hot spot and error profiles reported by the agents from each application in order to detect changes, which it is worth looking at.

An anomlary alert notification can be sent in case of an anomaly to an endpoint. Alert endpoints can be added in the in the Configuration section. The endpoint can be an email address, webhook URL or Slack Incoming Webhook.

See also:

Agent overhead

The agent overhead is measured to be less than 1% for applications under high load. For applications that are horizontally scaled to multiple processes, StackImpact agents are only active on a small subset of the processes at any point of time, therefore the total overhead is much lower.