Master essential benchmarking techniques, from basic timing to production-ready performance testing.
Performance isn’t just about making your code work—it’s about making it work well. When you’re building Python applications, understanding exactly how fast your code runs becomes the difference between software that feels responsive and software that makes users reach for the coffee while they wait.Benchmarking Python code isn’t rocket science, but it does require the right tools and techniques. Let’s explore four powerful approaches that will transform you from someone who hopes their code is fast to someone who knows exactly how fast it is!
If you have a Python performance question, try asking it to p99.chat, the assistant for code performance optimization. It can run, measure, and optimize any given code!
time is only available on UNIX-based systems, so if you’re working with Windows, you can skip this first step.
Sometimes the simplest tools are the most revealing. The Unix time command gives you a bird’s-eye view of your script’s performance, measuring everything from CPU usage to memory consumption.Let’s start with a practical example. Create a script that demonstrates different algorithmic approaches:
bench.py
Copy
Ask AI
import sysimport randomdef bubble_sort(arr): """Inefficient but educational sorting algorithm""" n = len(arr) for i in range(n): for j in range(0, n - i - 1): if arr[j] > arr[j + 1]: arr[j], arr[j + 1] = arr[j + 1], arr[j] return arrdef quick_sort(arr): """More efficient divide-and-conquer approach""" if len(arr) <= 1: return arr pivot = arr[len(arr) // 2] left = [x for x in arr if x < pivot] middle = [x for x in arr if x == pivot] right = [x for x in arr if x > pivot] return quick_sort(left) + middle + quick_sort(right)if __name__ == " __main__": # Generate test data size = int(sys.argv[1]) if len(sys.argv) > 1 else 1000 data = [random.randint(1, 1000) for _ in range(size)] # Pick the algorithm from command line arguments algorithm = sys.argv[2] if len(sys.argv) > 2 else "bubble" if algorithm == "bubble": result = bubble_sort(data.copy()) else: result = quick_sort(data.copy()) print(f"Sorted {len(result)} elements using {algorithm} sort")
Now let’s see the power of the time command in action:
Copy
Ask AI
$ time python3 bench.py 5000 bubbleSorted 5000 elements using bubble sortpython3 bench.py 5000 bubble 0.89s user 0.01s system 99% cpu 0.910 total$ time python3 bench.py 5000 quickSorted 5000 elements using quick sortpython3 bench.py 5000 quick 0.03s user 0.01s system 75% cpu 0.049 total
Look at that dramatic difference! Bubble sort consumed 0.89 seconds of CPU time while quicksort finished in just 0.03 seconds—nearly 30x faster. The 99% CPU utilization for bubble sort shows it’s working hard but inefficiently, while quicksort’s lower CPU percentage reflects its brief execution time.Different systems format the output slightly differently. On Linux systems for example, you might see:
Copy
Ask AI
$ time python bench.py 5000 bubbleSorted 5000 elements using bubble sortreal 0m0.825suser 0m0.806ssys 0m0.022s$ time python bench.py 5000 quickSorted 5000 elements using quick sortreal 0m0.091suser 0m0.054ssys 0m0.040s
The time command reveals three crucial metrics:
Real time (real or total): Wall-clock time from start to finish
User time (user): CPU time spent in user mode (your Python code executing—loops, calculations, memory operations)
System time (sys or system): CPU time spent in kernel mode (system calls, file I/O, memory allocation from the OS)
This approach is perfect when you want to understand your script’s overall resource consumption, including startup overhead and system interactions.
While time gives you the basics, hyperfine transforms benchmarking into a science. It runs multiple iterations, provides statistical analysis, and even generates beautiful comparison charts.After having installed hyperfine, you can get started pretty quicly:
Copy
Ask AI
$ hyperfine python bench.py 5000 quickBenchmark 1: python Time (mean ± σ): 19.9 ms ± 2.4 ms [User: 14.0 ms, System: 4.0 ms] Range (min … max): 17.8 ms … 36.5 ms 74 runs
Instead of running only once, hyperfine automatically ran your code 74 times and calculated meaningful statistics. That ± 2.4 ms standard deviation tells you how consistent your performance is, a crucial information that a single time run can’t provide.You can also compare commands with hyperfine:
Copy
Ask AI
$ hyperfine "python bench.py 5000 quick" "python bench.py 5000 bubble"Benchmark 1: python bench.py 5000 quick Time (mean ± σ): 28.1 ms ± 2.0 ms [User: 21.6 ms, System: 4.5 ms] Range (min … max): 26.5 ms … 40.4 ms 68 runsBenchmark 2: python bench.py 5000 bubble Time (mean ± σ): 917.0 ms ± 24.6 ms [User: 895.4 ms, System: 8.3 ms] Range (min … max): 899.3 ms … 969.3 ms 10 runsSummary python bench.py 5000 quick ran 32.59 ± 2.52 times faster than python bench.py 5000 bubble
Now we’re talking! Hyperfine not only confirms our 30x performance difference but quantifies the uncertainty in that measurement. The ”± 2.52” tells us the speedup could range from about 30x to 35x.
When you need to focus on specific functions rather than entire scripts, Python’s built-in timeit module becomes your microscope. It’s designed to minimize timing overhead and provide accurate measurements of small code snippets.Here is an example measuring the functions we previously created:
time.py
Copy
Ask AI
import timeitfrom bench import bubble_sort, quick_sort# Generate test datadata = [random.randint(1, 1000) for _ in range(5000)]bubble_time = timeit.timeit( lambda: bubble_sort(data.copy()), number=10)quick_time = timeit.timeit( lambda: quick_sort(data.copy()), number=10)
The conclusion remains the same—quicksort dramatically outperforms bubble sort—but notice something interesting about these numbers. Both measurements are significantly smaller than our earlier script-level benchmarks. We’ve eliminated the noise of Python interpreter startup, module imports, and command-line argument parsing. Now we’re measuring pure algorithmic performance, which gives us a clearer picture of what’s happening inside our functions.
Notice how we use lambda functions to wrap our calls—this approach is cleaner than string-based timing and provides better IDE support. The data.copy() call ensures each iteration works with fresh data, preventing any side effects from skewing our results.
The beauty of timeit lies in its surgical precision. While our previous tools measured entire script execution, timeit isolates the exact performance characteristics of individual functions. This granular approach becomes invaluable when you’re optimizing specific bottlenecks rather than entire applications.
Now we’re seeing the true power of statistical benchmarking! Look at those numbers—bubble sort clocked in at 448 milliseconds while quicksort blazed through in just 194 microseconds. That’s a staggering 2,300x performance difference.Notice how pytest-codspeed automatically determined the optimal number of iterations: 6 runs for the slow bubble sort versus 1,005 runs for the lightning-fast quicksort. This intelligent adaptation ensures statistical significance regardless of your algorithm’s performance characteristics.Learn more about the plugin in the pytest-codspeed reference.
What makes this approach transformative isn’t just the numbers—it’s how easily it integrates into your existing workflow. You’ve just created the foundation for a performance monitoring system that can run locally during development and automatically in CI/CD pipelines.This is the first step toward performance-conscious development. While you can now validate performance locally, the real power emerges when you integrate these benchmarks into your continuous integration pipeline. Every pull request becomes a performance checkpoint, every deployment includes performance validation, and performance regressions are caught before they reach production.The CodSpeed ecosystem makes this transition seamless—from local development to continuous testing in just a few configuration steps. Check out this guide:
Each tool serves a specific purpose in your performance toolkit:
Use thetimecommand when you need a quick sanity check of overall script performance or want to understand system resource usage. It’s perfect for comparing different implementations at the application level.
Choosehyperfinewhen you need statistical rigor for command-line tools or want to track performance across different input parameters. Its warmup runs and statistical analysis make it ideal for detecting small performance changes.
Reach fortimeitwhen you’re optimizing specific functions or comparing different algorithmic approaches. Its focus on eliminating timing overhead makes it perfect for micro-benchmarks.
Implementpytest-codspeedwhen performance becomes a first-class concern in your development process. It transforms performance testing from an afterthought into an integral part of your test suite.