TensorFlow Profiling in Development and Production Environments

To optimize training and inference time or memory footprint in TensorFlow applications, a performance profiler is instrumental. However, profiling TensorFlow isn’t trivial, and more so in production environment, where security and low profiling overhead are critical.

StackImpact Python profiler automates this complexity. It supports automatic and manual profiling of TensorFlow-based applications at the development stage or in operational environments such as production. Both use cases are demonstrated in this article.

StackImpact profiler can operate in both manual and automatic modes. In manual mode different samplers can be started and stopped directly, using agent API as shown in this article. This gives the necessary control when profiling scripts and applications during development. In non-interactive environments, such as production, the Python profiler agent manages profilers automatically.

In case of both manual and automatic profiling, the agent will make sure the overhead of profiling is low. This is achieved by sampling techniques. For example, only a limited number of TensorFlow session runs are profiles. Under the hood, the profiler relies on TensorFlow’s built-in tracing, which, when enabled, provides graph execution information.

In addition to TensorFlow profiling, Python profiler also reports CPU, memory allocation and blocking call profiles as well as errors and various runtime metrics.

Manual Profiling for Development Environments

The following example demonstrates how to profile simple TensorFlow operations. Note, that automatic profiling should be disabled in manual mode.

See Python profiler documentation or the GitHub page for full setup instructions.

import sys
import tensorflow as tf
import stackimpact

agent = stackimpact.start(
    agent_key = 'agent key here',
    app_name = 'MyTensorFlowScript',
    auto_profiling = False)


x = tf.random_normal([1000, 1000])
y = tf.random_normal([1000, 1000])
res = tf.matmul(x, y)

with tf.Session() as sess:


Calling the agent.stop_tf_profiler() also reports the recorded profiles to the Dashboard, where they can be analyzed.

The code locations in the profiles point to the operation initialization code rather than the graph execution code. The goal is to allow locating the operations that consume most of the time and resources.

The following two screenshots show how the CPU and memory profiles are presented in the Dashboard for the code example shown above, which is the manual.py python script in the profiles.



Automatic Profiling for Operational Environments (e.g. Production)

By default the agent profiles applications automatically, which is enabled just by initializing the agent. Automatic profiling is suitable for long-running applications or scripts. If an application runs multiple instances, one one or two instances will be profiled at any point of time.

import stackimpact

agent = stackimpact.start(
    agent_key = 'agent key here',
    app_name = 'MyTensorFlowScript')

Optionally, to make profiling more precise, focused profiling can be used. The agent.profile() method will suggest the agent which execution spans should be profiled.

with agent.profile():

When running application with StackImpact profiler agent, the reported profiles are available in the Dashboard in a historically comparable form, as seen in the following screenshot.