pytest-codspeed documentation

Overview

pytest-codspeed is a pytest plugin for measuring and tracking performance of Python code. It provides benchmarking capabilities with support for both wall-time and CPU instrumentation measurements.

Installation

uv add --dev pytest-codspeed

Example Usage

$ pytest tests/ --codspeed
============================= test session starts ====================
platform darwin -- Python 3.13.0, pytest-7.4.4, pluggy-1.5.0
codspeed: 3.0.0 (enabled, mode: walltime, timer_resolution: 41.7ns)
rootdir: /home/user/codspeed-test, configfile: pytest.ini
plugins: codspeed-3.0.0
collected 1 items

tests/test_sum_squares.py .                                    [ 100%]

                         Benchmark Results
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┓
┃     Benchmark  ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters  ┃
┣━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━╋━━━━━━━━┫
┃test_sum_squares┃     1,873ns ┃        4.8% ┃    3.00s ┃ 66,930 ┃
┗━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━┻━━━━━━━━━━━━━┻━━━━━━━━━━┻━━━━━━━━┛
=============================== 1 benchmarked ========================
=============================== 1 passed in 4.12s ====================

Command Line Options

--codspeed

Enable the CodSpeed benchmarking plugin for the test session.(This is automatically enabled when running under the CodSpeed runner or from the GitHub action)

--codspeed-mode

auto | instrumentation | walltime

default:"auto"

The measurement instrument to use for measuring performance.

auto: Automatically select the measurement instrument based on the environment.
instrumentation: Use the CPU simulation instrument.
walltime: Use the wall-clock time instrument. Automatically enabled on macro runners.

--codspeed-warmup-time

number

default:0

The time to warm up the benchmark for (in seconds), only for walltime mode.

--codspeed-max-time

number

default:1

The maximum time to run a benchmark for (in seconds), only for walltime mode.

--codspeed-max-rounds

number

default:100

The maximum number of rounds to run a benchmark for, only for walltime mode.

All of these walltime-specific command line options can be overridden by more specific settings set by the benchmark marker.For example, if you set warmup_time in the benchmark marker, it will take precedence over the --codspeed-warmup-time command line option.

Creating Benchmarks

There are multiple ways to mark tests as benchmarks at different levels:

The `pytest.mark.benchmark` marker

Marking a test with the pytest.mark.benchmark marker will automatically mark it as a benchmark. This means that the entire test function will be measured. For more fine-grained control, see the benchmark fixture section.

test_sum_powers.py

import pytest

@pytest.mark.benchmark
def test_sum_squares():
    input = [1, 2, 3, 4, 5]
    output = sum(i**2 for i in input)
    assert output == 55

You can also mark all the tests contained in a test file as benchmarks by using the pytestmark variable at the module level.

test_sum_powers.py

import pytest

pytestmark = pytest.mark.benchmark

def test_sum_squares():
    input = [1, 2, 3, 4, 5]
    output = sum(i**2 for i in input)
    assert output == 55

def test_sum_cubes():
    input = [1, 2, 3, 4, 5]
    output = sum(i**3 for i in input)
    assert output == 225

The `benchmark` fixture

When more fine-grained control is needed, the benchmark fixture can be used. This fixture is exposed by the pytest-codspeed plugin, allowing to select exactly the code to be measured.

A fixture is a function that can be used to set up and tear down the state of a test. More information about fixtures can be found in the pytest documentation.

Direct invocation

The fixture can be used directly in the test function:

def test_sum_squares(benchmark):
    data = [1, 2, 3, 4, 5]
    benchmark(sum, data) # Only the `sum` function is measured

The fixture behaves as an identity function: calling benchmark(target, *args, **kwargs) will have the same effect as calling target(*args, **kwargs). The return value will also be passed along to make it possible to write assertions on the result. For example:

def test_sum_squares(benchmark):
    input = [1, 2, 3, 4, 5]
    output = benchmark(sum, [i**2 for i in input])
    assert output == 55

It’s also possible to use it with lambda functions:

def test_sum_squares(benchmark):
    input = [1, 2, 3, 4, 5]
    output = benchmark(lambda: sum(i**2 for i in input))
    assert output == 55

As a decorator

If you want to measure a block of code containing multiple function calls, you can use the fixture as a decorator:

def test_sum_squares_cubes(benchmark):
    input = [1, 2, 3, 4, 5]
    @benchmark
    def measured_function():
        squares = sum(i**2 for i in input)
        cubes = sum(i**3 for i in input)
        return squares + cubes

When using the fixture, the marker is not necessary anymore, except if you want to customize the execution.

The benchmark fixture can only be used once per test function.For example, the following code will raise an error:

def test_invalid(benchmark):
    benchmark(func1)  # OK
    benchmark(func2)  # ERROR: RuntimeError

Benchmark options

The @pytest.mark.benchmark marker accepts several options to customize the benchmark execution:

group

string

The group name to use for the benchmark. This can be useful to organize related benchmarks together.(Will be supported soon in the UI)

min_time

number

default:0

The minimum time of a round (in seconds). Only available in walltime mode.

max_time

number

default:1

The maximum time to run the benchmark for (in seconds). Only available in walltime mode.

max_rounds

number

default:100

The maximum number of rounds to run the benchmark for. Takes precedence over max_time. Only available in walltime mode.

Example usage:

@pytest.mark.benchmark(
    group="sorting",
    min_time=0.1,
    max_time=1.0,
    max_rounds=100
)
def test_sorting_algorithm(benchmark):
    data = [1, 2, 3, 4, 5]
    benchmark(quicksort, data)

The min_time, max_time and max_rounds options are only available in walltime mode. When using instrumentation mode (Valgrind), these options are ignored.

Pedantic mode (advanced)

For fine-grained control over benchmark execution protocol, you can use the benchmark.pedantic method. For example:

def test_pedantic_mode(benchmark):
    def setup():
        # Setup code that shouldn't be measured
        data = list(range(1000))
        return (data,), {}  # Returns (args, kwargs) for target

    def target(data):
        # Code to benchmark
        return sorted(data)

    def teardown(data):
        # Cleanup code that shouldn't be measured
        data.clear()

    result = benchmark.pedantic(
        target,
        setup=setup,
        teardown=teardown,
        rounds=5,          # Number of rounds to run
        warmup_rounds=1    # Number of warmup rounds
    )

The benchmark.pedantic method accepts the following parameters:

target

required

The function to benchmark. This is the main code that will be measured.

args

tuple[Any, ...]

default:"()"

Positional arguments to pass to the target function.

kwargs

dict[str, Any]

default:"{}"

Keyword arguments to pass to the target function.

setup

Callable | None

default:"None"

Optional setup function that runs before each round. If it returns a tuple of (args, kwargs), these will be passed to the target function.

teardown

Callable | None

default:"None"

Optional teardown function that runs after each round. Receives the same arguments as the target function.

warmup_rounds

int

default:"0"

Number of warmup rounds to run before the actual benchmark. These rounds are not included in the measurements.

rounds

int

default:"1"

Number of rounds to run the benchmark for.

This parameter is ignored when using the CPU instrumentation mode.

iterations

int

default:"1"

Number of iterations to run within each round. The total number of executions will be

rounds \times iterations

This parameter is ignored when using the CPU instrumentation mode.

Recipes

Parametrized benchmarks

pytest-codspeed fully supports pytest’s parametrization out of the box:

import pytest

@pytest.mark.parametrize("size", [10, 100, 1000])
def test_parametrized_benchmark(benchmark, size):
    data = list(range(size))
    benchmark(sum, data)

Compatibility

pytest-codspeed is designed to be fully backward compatible with pytest-benchmark.

You can use both plugins in the same project, though only one will be active at a time.

Running the benchmarks continuously

To run the benchmarks continuously in your CI, you can use pytest-codspeed along with the CodSpeed runner. We have first-class support for the following CI providers:

GitHub Actions

GitLab CI

Buildkite

If your provider is not listed here, please open an issue or contact us on Discord.

​Overview

​Installation

​Example Usage

​Command Line Options

​Creating Benchmarks

​The pytest.mark.benchmark marker

​The benchmark fixture

​Direct invocation

​As a decorator

​Benchmark options

​Pedantic mode (advanced)

​Recipes

​Parametrized benchmarks

​Compatibility

​Running the benchmarks continuously

GitHub Actions

GitLab CI

Buildkite

Overview

Installation

Example Usage

Command Line Options

Creating Benchmarks

The `pytest.mark.benchmark` marker

The `benchmark` fixture

Direct invocation

As a decorator

Benchmark options

Pedantic mode (advanced)

Recipes

Parametrized benchmarks

Compatibility

Running the benchmarks continuously