Explore and compare different Python benchmarking approaches—from command-line tools to integrated test frameworks—to find the best fit for your workflow.
Performance isn’t just about making your code work—it’s about making it work
well. When you’re building Python applications, understanding exactly how fast
your code runs becomes the difference between software that feels responsive and
software that makes users reach for the coffee while they wait.Benchmarking Python code isn’t rocket science, but it does require the right
tools and techniques. Let’s explore four powerful approaches that will transform
you from someone who hopes their code is fast to someone who knows exactly how
fast it is!
If you have a Python performance question, try asking it to
p99.chat, the assistant for code performance optimization.
It can run, measure, and optimize any given code!
time is only available on UNIX-based systems, so if you’re working with
Windows, you can skip this first step.
Sometimes the simplest tools are the most revealing. The Unix time command
gives you a bird’s-eye view of your script’s performance, measuring everything
from CPU usage to memory consumption.Let’s start with a practical example. Create a script that demonstrates
different algorithmic approaches:
bench.py
Copy
Ask AI
import sysimport randomdef bubble_sort(arr): """Inefficient but educational sorting algorithm""" n = len(arr) for i in range(n): for j in range(0, n - i - 1): if arr[j] > arr[j + 1]: arr[j], arr[j + 1] = arr[j + 1], arr[j] return arrdef quick_sort(arr): """More efficient divide-and-conquer approach""" if len(arr) <= 1: return arr pivot = arr[len(arr) // 2] left = [x for x in arr if x < pivot] middle = [x for x in arr if x == pivot] right = [x for x in arr if x > pivot] return quick_sort(left) + middle + quick_sort(right)if __name__ == " __main__": # Generate test data size = int(sys.argv[1]) if len(sys.argv) > 1 else 1000 data = [random.randint(1, 1000) for _ in range(size)] # Pick the algorithm from command line arguments algorithm = sys.argv[2] if len(sys.argv) > 2 else "bubble" if algorithm == "bubble": result = bubble_sort(data.copy()) else: result = quick_sort(data.copy()) print(f"Sorted {len(result)} elements using {algorithm} sort")
Now let’s see the power of the time command in action:
Copy
Ask AI
$ time python3 bench.py 5000 bubbleSorted 5000 elements using bubble sortpython3 bench.py 5000 bubble 0.89s user 0.01s system 99% cpu 0.910 total$ time python3 bench.py 5000 quickSorted 5000 elements using quick sortpython3 bench.py 5000 quick 0.03s user 0.01s system 75% cpu 0.049 total
Look at that dramatic difference! Bubble sort consumed 0.89 seconds of CPU time
while quicksort finished in just 0.03 seconds—nearly 30x faster. The 99% CPU
utilization for bubble sort shows it’s working hard but inefficiently, while
quicksort’s lower CPU percentage reflects its brief execution time.Different systems format the output slightly differently. On Linux systems for
example, you might see:
Copy
Ask AI
$ time python bench.py 5000 bubbleSorted 5000 elements using bubble sortreal 0m0.825suser 0m0.806ssys 0m0.022s$ time python bench.py 5000 quickSorted 5000 elements using quick sortreal 0m0.091suser 0m0.054ssys 0m0.040s
The time command reveals three crucial metrics:
Real time (real or total): Wall-clock time from start to finish
User time (user): CPU time spent in user mode (your Python code
executing—loops, calculations, memory operations)
System time (sys or system): CPU time spent in kernel mode (system
calls, file I/O, memory allocation from the OS)
This approach is perfect when you want to understand your script’s overall
resource consumption, including startup overhead and system interactions.
While time gives you the basics, hyperfine transforms benchmarking into a
science. It runs multiple iterations, provides statistical analysis, and even
generates beautiful comparison charts.After having
installed hyperfine,
you can get started pretty quicly:
Copy
Ask AI
$ hyperfine python bench.py 5000 quickBenchmark 1: python Time (mean ± σ): 19.9 ms ± 2.4 ms [User: 14.0 ms, System: 4.0 ms] Range (min … max): 17.8 ms … 36.5 ms 74 runs
Instead of running only once, hyperfine automatically ran your code 74 times and
calculated meaningful statistics. That ± 2.4 ms standard deviation tells you how
consistent your performance is, a crucial information that a single time run
can’t provide.You can also compare commands with hyperfine:
Copy
Ask AI
$ hyperfine "python bench.py 5000 quick" "python bench.py 5000 bubble"Benchmark 1: python bench.py 5000 quick Time (mean ± σ): 28.1 ms ± 2.0 ms [User: 21.6 ms, System: 4.5 ms] Range (min … max): 26.5 ms … 40.4 ms 68 runsBenchmark 2: python bench.py 5000 bubble Time (mean ± σ): 917.0 ms ± 24.6 ms [User: 895.4 ms, System: 8.3 ms] Range (min … max): 899.3 ms … 969.3 ms 10 runsSummary python bench.py 5000 quick ran 32.59 ± 2.52 times faster than python bench.py 5000 bubble
Now we’re talking! Hyperfine not only confirms our 30x performance difference
but quantifies the uncertainty in that measurement. The ”± 2.52” tells us the
speedup could range from about 30x to 35x.
When you need to focus on specific functions rather than entire scripts,
Python’s built-in timeit module becomes your microscope. It’s designed to
minimize timing overhead and provide accurate measurements of small code
snippets.Here is an example measuring the functions we previously created:
time.py
Copy
Ask AI
import timeitfrom bench import bubble_sort, quick_sort# Generate test datadata = [random.randint(1, 1000) for _ in range(5000)]bubble_time = timeit.timeit( lambda: bubble_sort(data.copy()), number=10)quick_time = timeit.timeit( lambda: quick_sort(data.copy()), number=10)
The conclusion remains the same—quicksort dramatically outperforms bubble
sort—but notice something interesting about these numbers. Both measurements are
significantly smaller than our earlier script-level benchmarks. We’ve eliminated
the noise of Python interpreter startup, module imports, and command-line
argument parsing. Now we’re measuring pure algorithmic performance, which gives
us a clearer picture of what’s happening inside our functions.
Notice how we use lambda functions to wrap our calls—this approach is
cleaner than string-based timing and provides better IDE support. The
data.copy() call ensures each iteration works with fresh data, preventing
any side effects from skewing our results.
The beauty of timeit lies in its surgical precision. While our previous tools
measured entire script execution, timeit isolates the exact performance
characteristics of individual functions. This granular approach becomes
invaluable when you’re optimizing specific bottlenecks rather than entire
applications.
Now we’re seeing the true power of statistical benchmarking! Look at those
numbers—bubble sort clocked in at 448 milliseconds while quicksort blazed
through in just 194 microseconds. That’s a staggering 2,300x performance
difference.Notice how pytest-codspeed automatically determined the optimal number of
iterations: 6 runs for the slow bubble sort versus 1,005 runs for the
lightning-fast quicksort. This intelligent adaptation ensures statistical
significance regardless of your algorithm’s performance characteristics.Learn more about the plugin in
the pytest-codspeed reference.
What makes this approach transformative isn’t just the numbers—it’s how easily
it integrates into your existing workflow. You’ve just created the foundation
for a performance monitoring system that can run locally during development and
automatically in CI/CD pipelines.This is the first step toward performance-conscious development. While you can
now validate performance locally, the real power emerges when you integrate
these benchmarks into your continuous integration pipeline. Every pull request
becomes a performance checkpoint, every deployment includes performance
validation, and performance regressions are caught before they reach production.The CodSpeed ecosystem makes this transition seamless—from local development to
continuous testing in just a few configuration steps. Check out this guide:
Each tool serves a specific purpose in your performance toolkit:
Use thetimecommand when you need a quick sanity check of overall
script performance or want to understand system resource usage. It’s perfect
for comparing different implementations at the application level.
Choosehyperfinewhen you need statistical rigor for command-line
tools or want to track performance across different input parameters. Its
warmup runs and statistical analysis make it ideal for detecting small
performance changes.
Reach fortimeitwhen you’re optimizing specific functions or
comparing different algorithmic approaches. Its focus on eliminating timing
overhead makes it perfect for micro-benchmarks.
Implementpytest-codspeedwhen performance becomes a first-class
concern in your development process. It transforms performance testing from an
afterthought into an integral part of your test suite.