Trace generation
Learn how the trace generation works and how to read flame graphs.
During the instrumented benchmark execution, we collect information about the execution of the called functions. This information is then used to generate a flame graph, bringing more granularity into the performance of your code.
Pre-requisites
- Node.js: Node 16 or higher, and the following minimum versions of the integrations
- Python: Python 3.12 or higher and
pytest-codspeed>=2.0.0
- Rust: Trace generation is enabled by default with any version of the integration library.
FFI Support
If you’re using Foreign Function Interface, typically calling C/C++/Rust code from Python or Node.js, make sure to generate debug symbols for the foreign functions so that you can see them in the flame graph.
Reading Flame Graphs
Flame graphs are a visualization tool for profiling software. They provide a graphical representation of your program’s execution, making it easier to understand the runtime complexities involved.
Let’s start with an example:
Here, each rectangle represents a function. The width of the rectangle is proportional to the amount of time spent in that function. The wider the rectangle, the more time was spent in that function. The vertical axis represents the call depth (a.k.a. stack depth), which represents the call hierarchy.
For this example, here is the call hierarchy:
- The root caller is the
app
function. app
calls theinit
,handleRequest
andterminate
functions.handleRequest
calls bothauthenticateUser
andprocessData
.processData
callsfoo
which in turn callsbar
.
Aggregated function calls
Functions calls are aggregated, so if a function is called multiple times, the time spent in all calls is aggregated into a single block.
Thus, the following code:
Will generate the following flame graph:
Where the foo
function is called twice and the bar
function is called 4
times, but the time spent is aggregated into a single block for each function.
Self-costs
In the previous example, we could see the global cost of each function call quite clearly. However, it can be tricky to find out how much time was spent within the function itself.
In the above illustration, we can see two types of self-costs:
- (implicit) self-costs: the time spent in the function itself is the whole width of the rectangle since it doesn’t call any other functions.
- self-costs: the self-cost here is visible as the space not occupied by the children of the block.
Self-costs in interpreted languages
In Python or Node.js, the self-cost is the time spent in the function itself, but also the time spent by the interpreter. This means that a function will always have a self-cost, even if the function does nothing.
If we change a bit the example, adding a lot of computation directly in the main function:
Then the flame graph would look like this, with the self-cost of main
being
much bigger than before:
Viewing Flame Graphs
On the pull request page, you can access the flame graphs of a benchmark by expanding it.
Example of flame graphs on a pull request page
Three types of flame graphs are available:
- Base: flame graph of the benchmark base run
- Head: flame graph of the benchmark run from the latest commit of the pull request
- Diff: difference between the head and the base flame graphs