To optimize training and inference time or memory footprint in TensorFlow applications, a performance profiler is instrumental. However, profiling TensorFlow isn’t trivial, and more so in production environment, where security and low profiling overhead are critical.
Performance profiling is essential for optimizing and fixing an application’s resource consumption, response time and failures. A performance issue without an execution profile is like an error without a stack trace. It will lead to a lot of manual work to get to the root cause.
The difference between a program’s exponential, linear, logarithmic and constant execution times is critical for various use cases. Even if an algorithm is purposely designed to satisfy a certain complexity class, there are multiple reasons why it might not. An underlying library, OS or even hardware can be the root cause of a performance problem.
There are multiple reasons why a program will consume more CPU resources than excepted. In the case of a high computational complexity of an algorithm, the amount of data it operates on will drive the CPU usage. For I/O-intensive programs, data processing may be the bottleneck. Garbage collection activity is another usual suspect.
A reference to an object, if not properly managed, may be left assigned even if unused. This usually happens on an application logic level but can also be an issue inside of an imported package.