> ## Documentation Index
> Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Choosing the Correct Python Benchmarking Strategy

> Explore and compare different Python benchmarking approaches—from command-line tools to integrated test frameworks—to find the best fit for your workflow.

Performance isn't just about making your code work—it's about making it work
well. When you're building Python applications, understanding exactly how fast
your code runs becomes the difference between software that feels responsive and
software that makes users reach for the coffee while they wait.

Benchmarking Python code isn't rocket science, but it does require the right
tools and techniques. Let's explore four powerful approaches that will transform
you from someone who hopes their code is fast to someone who knows exactly how
fast it is!

<Tip>
  If you have a Python performance question, try asking it to
  [p99.chat](https://p99.chat), the assistant for code performance optimization.
  It can run, measure, and optimize any given code!
</Tip>

## Starting with the`time` command

<Info>
  `time` is only available on UNIX-based systems, so if you're working with
  Windows, you can skip this first step.
</Info>

Sometimes the simplest tools are the most revealing. The Unix `time` command
gives you a bird's-eye view of your script's performance, measuring everything
from CPU usage to memory consumption.

Let's start with a practical example. Create a script that demonstrates
different algorithmic approaches:

```python bench.py theme={null}
import sys
import random

def bubble_sort(arr):
    """Inefficient but educational sorting algorithm"""
    n = len(arr)
    for i in range(n):
        for j in range(0, n - i - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr

def quick_sort(arr):
    """More efficient divide-and-conquer approach"""
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)

if __name__ == " __main__":
    # Generate test data
    size = int(sys.argv[1]) if len(sys.argv) > 1 else 1000
    data = [random.randint(1, 1000) for _ in range(size)]
    # Pick the algorithm from command line arguments
    algorithm = sys.argv[2] if len(sys.argv) > 2 else "bubble"
    if algorithm == "bubble":
        result = bubble_sort(data.copy())
    else:
        result = quick_sort(data.copy())
    print(f"Sorted {len(result)} elements using {algorithm} sort")
```

Now let's see the power of the `time` command in action:

```shellsession title=terminal icon="square-terminal" theme={null}
$ time python3 bench.py 5000 bubble
Sorted 5000 elements using bubble sort
python3 bench.py 5000 bubble  0.89s user 0.01s system 99% cpu 0.910 total

$ time python3 bench.py 5000 quick
Sorted 5000 elements using quick sort
python3 bench.py 5000 quick  0.03s user 0.01s system 75% cpu 0.049 total
```

Look at that dramatic difference! Bubble sort consumed 0.89 seconds of CPU time
while quicksort finished in just 0.03 seconds—nearly 30x faster. The 99% CPU
utilization for bubble sort shows it's working hard but inefficiently, while
quicksort's lower CPU percentage reflects its brief execution time.

Different systems format the output slightly differently. On Linux systems for
example, you might see:

```shellsession title=terminal icon="square-terminal" theme={null}
$ time python bench.py 5000 bubble
Sorted 5000 elements using bubble sort

real	0m0.825s
user	0m0.806s
sys	0m0.022s

$ time python bench.py 5000 quick
Sorted 5000 elements using quick sort

real	0m0.091s
user	0m0.054s
sys	0m0.040s
```

The `time` command reveals three crucial metrics:

* **Real time** (`real` or `total`): Wall-clock time from start to finish
* **User time** (`user`): CPU time spent in user mode (your Python code
  executing—loops, calculations, memory operations)
* **System time** (`sys` or `system`): CPU time spent in kernel mode (system
  calls, file I/O, memory allocation from the OS)

This approach is perfect when you want to understand your script's overall
resource consumption, including startup overhead and system interactions.

## Precision Benchmarking with `hyperfine`

While `time` gives you the basics, hyperfine transforms benchmarking into a
science. It runs multiple iterations, provides statistical analysis, and even
generates beautiful comparison charts.

After having
[installed hyperfine](https://github.com/sharkdp/hyperfine?tab=readme-ov-file#installation),
you can get started pretty quickly:

```shellsession title=terminal icon="square-terminal" theme={null}
$ hyperfine python bench.py 5000 quick
Benchmark 1: python
  Time (mean ± σ):      19.9 ms ±   2.4 ms    [User: 14.0 ms, System: 4.0 ms]
  Range (min … max):    17.8 ms …  36.5 ms    74 runs
```

Instead of running only once, hyperfine automatically ran your code 74 times and
calculated meaningful statistics. That ± 2.4 ms standard deviation tells you how
consistent your performance is, a crucial information that a single `time` run
can't provide.

You can also compare commands with hyperfine:

```shellsession title=terminal icon="square-terminal" theme={null}
$ hyperfine "python bench.py 5000 quick" "python bench.py 5000 bubble"
Benchmark 1: python bench.py 5000 quick
  Time (mean ± σ):      28.1 ms ±   2.0 ms    [User: 21.6 ms, System: 4.5 ms]
  Range (min … max):    26.5 ms …  40.4 ms    68 runs

Benchmark 2: python bench.py 5000 bubble
  Time (mean ± σ):     917.0 ms ±  24.6 ms    [User: 895.4 ms, System: 8.3 ms]
  Range (min … max):   899.3 ms … 969.3 ms    10 runs

Summary
  python bench.py 5000 quick ran
   32.59 ± 2.52 times faster than python bench.py 5000 bubble
```

Now we're talking! Hyperfine not only confirms our 30x performance difference
but quantifies the uncertainty in that measurement. The "± 2.52" tells us the
speedup could range from about 30x to 35x.

## Function-Level Precision with `timeit`

When you need to focus on specific functions rather than entire scripts,
Python's built-in `timeit` module becomes your microscope. It's designed to
minimize timing overhead and provide accurate measurements of small code
snippets.

Here is an example measuring the functions we previously created:

```python time.py theme={null}
import timeit
from bench import bubble_sort, quick_sort

# Generate test data
data = [random.randint(1, 1000) for _ in range(5000)]

bubble_time = timeit.timeit(
    lambda: bubble_sort(data.copy()),
    number=10
)

quick_time = timeit.timeit(
    lambda: quick_sort(data.copy()),
    number=10
)
```

```shellsession title=terminal icon="square-terminal" theme={null}
$ python time.py
Bubble sort: 0.6165 seconds
Quick sort: 0.0100 seconds
Speedup: 31.78x
```

The conclusion remains the same—quicksort dramatically outperforms bubble
sort—but notice something interesting about these numbers. Both measurements are
significantly smaller than our earlier script-level benchmarks. We've eliminated
the noise of Python interpreter startup, module imports, and command-line
argument parsing. Now we're measuring pure algorithmic performance, which gives
us a clearer picture of what's happening inside our functions.

<Tip>
  Notice how we use `lambda` functions to wrap our calls—this approach is
  cleaner than string-based timing and provides better IDE support. The
  `data.copy()` call ensures each iteration works with fresh data, preventing
  any side effects from skewing our results.
</Tip>

The beauty of `timeit` lies in its surgical precision. While our previous tools
measured entire script execution, `timeit` isolates the exact performance
characteristics of individual functions. This granular approach becomes
invaluable when you're optimizing specific bottlenecks rather than entire
applications.

## Create Benchmarks from Existing Test Suites

First, let's create proper tests for our sorting functions. The first step is to
install the testing library: `pytest`:

```shellsession title=terminal icon="square-terminal" theme={null}
$ uv add --dev pytest
```

<Tip>
  We recommend using
  [uv](https://docs.astral.sh/uv/getting-started/installation/) to create a
  project, you can simply run `uv init` and it will turn your directory into a
  Python project.
</Tip>

Then we can create some tests:

```python test_sort.py theme={null}
import random
from bench import bubble_sort, quick_sort

data = [random.randint(1, 1000) for _ in range(5000)]

def test_bubble_sort_performance():
    """Benchmark bubble sort with 5000 elements"""
    result = bubble_sort(data)
    assert result == sorted(data)


def test_quick_sort_performance():
    """Benchmark quick sort with 5000 elements"""
    result = quick_sort(data)
    assert result == sorted(data)
```

Now let's run those tests:

```shellsession title=terminal icon="square-terminal" theme={null}
$ uv run pytest test_sort.py
============================== test session starts ===============================
platform darwin -- Python 3.13.0, pytest-8.4.0, pluggy-1.6.0
rootdir: /private/tmp/bench
configfile: pyproject.toml
collected 2 items

test_sort.py ..                                                            [100%]

=============================== 2 passed in 0.78s ================================
```

Great! Your tests validate that both sorting algorithms produce correct
results.

### Turning the test cases into benchmarks

Now comes the real magic. Install `pytest-codspeed` and transform these
correctness tests into performance benchmarks with minimal changes:

```shellsession title=terminal icon="square-terminal" theme={null}
$ uv add --dev pytest-codspeed
```

Update your tests, adding the `benchmark` fixture as a parameter and using it to
wrap the execution of the sort algorithm:

```python test_sort.py {6,8,12,14} theme={null}
import random
from bench import bubble_sort, quick_sort

data = [random.randint(1, 1000) for _ in range(5000)]

def test_bubble_sort_performance(benchmark):
    """Benchmark bubble sort with 5000 elements"""
    result = benchmark(lambda: bubble_sort(data))
    assert result == sorted(data)


def test_quick_sort_performance(benchmark):
    """Benchmark quick sort with 5000 elements"""
    result = benchmark(lambda: quick_sort(data))
    assert result == sorted(data)
```

Finally, let's burn the CPU for a bit:

```shellsession title=terminal icon="square-terminal" theme={null}
$ uv run pytest --codspeed test_sort.py
============================== test session starts ===============================
platform darwin -- Python 3.13.0, pytest-8.4.0, pluggy-1.6.0
codspeed: 3.2.0 (enabled, mode: walltime, timer_resolution: 41.7ns)
rootdir: /private/tmp/bench
configfile: pyproject.toml
plugins: codspeed-3.2.0
collected 2 items

test_sort.py ..                                                            [100%]

                                Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃                    Benchmark ┃   Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━╋━━━━━━━┫
┃ test_bubble_sort_performance ┃ 448,626,916ns ┃        3.1% ┃    2.74s ┃     6 ┃
┃  test_quick_sort_performance ┃     194,546ns ┃        9.7% ┃    3.04s ┃ 1,005 ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━┻━━━━━━━━━━┻━━━━━━━┛

================================= 2 benchmarked ==================================
=============================== 2 passed in 8.94s ================================
```

Now we're seeing the true power of statistical benchmarking! Look at those
numbers—bubble sort clocked in at 448 milliseconds while quicksort blazed
through in just 194 microseconds. That's a staggering 2,300x performance
difference.

Notice how `pytest-codspeed` automatically determined the optimal number of
iterations: 6 runs for the slow bubble sort versus 1,005 runs for the
lightning-fast quicksort. This intelligent adaptation ensures statistical
significance regardless of your algorithm's performance characteristics.

Learn more about the plugin in
[the `pytest-codspeed` reference](/reference/pytest-codspeed).

### The Foundation for Continuous Performance Testing

What makes this approach transformative isn't just the numbers—it's how easily
it integrates into your existing workflow. You've just created the foundation
for a performance monitoring system that can run locally during development and
automatically in CI/CD pipelines.

This is the first step toward performance-conscious development. While you can
now validate performance locally, the real power emerges when you integrate
these benchmarks into your continuous integration pipeline. Every pull request
becomes a performance checkpoint, every deployment includes performance
validation, and performance regressions are caught before they reach production.

The CodSpeed ecosystem makes this transition seamless—from local development to
continuous testing in just a few configuration steps. Check out this guide:

## Choosing Your Benchmarking Strategy

Each tool serves a specific purpose in your performance toolkit:

* **Use the** `time` **command when** you need a quick sanity check of overall
  script performance or want to understand system resource usage. It's perfect
  for comparing different implementations at the application level.
* **Choose** `hyperfine` **when** you need statistical rigor for command-line
  tools or want to track performance across different input parameters. Its
  warmup runs and statistical analysis make it ideal for detecting small
  performance changes.
* **Reach for** `timeit` **when** you're optimizing specific functions or
  comparing different algorithmic approaches. Its focus on eliminating timing
  overhead makes it perfect for micro-benchmarks.
* **Implement** `pytest-codspeed` **when** performance becomes a first-class
  concern in your development process. It transforms performance testing from an
  afterthought into an integral part of your test suite.

## Suggested Reading

<Card title="pytest-codspeed documentation" icon="python" horizontal href="/reference/pytest-codspeed">
  A more advanced resource on writing benchmarks with pytest-codspeed.
</Card>

<Card title="Running Python Benchmarks in your CI" icon="vial" horizontal href="/">
  A more advanced resource on continuous performance testing in Python
</Card>

## Resources

* [`pytest-codspeed — a pytest benchmarking plugin`](/reference/pytest-codspeed)
* [`hyperfine — a CLI benchmarking tool`](https://github.com/sharkdp/hyperfine)
* [`timeit — Measure execution time of small code snippets`](https://docs.python.org/3.13/library/timeit.html)
* [`time (Unix) — Wikipedia`](https://en.wikipedia.org/wiki/Time_\(Unix\))