Find CPU and Memory Bottlenecks with Performance Counters
Understanding that your code is slow is one thing. Understanding why it's slow
is what lets you fix it. Walltime profiling now automatically collects hardware
performance counters during execution, giving you deep insights into CPU cycles,
instruction counts, memory operations, and cache behavior.
Performance counters showing cache behavior and memory traffic
The visual memory access pattern gauge shows at a glance where your code spends
its time:
High L1 cache hit rate? Your data access patterns are efficient.
Lots of cache misses? Consider restructuring data layouts or reducing memory
footprint.
Large memory access distribution across all levels? Your working set may be
too large for cache, consider processing data in smaller chunks.
Combined with the flame graph, you can now trace performance issues from
high-level function calls down to specific memory access patterns causing
slowdowns.