Overview
pytest-codspeed
is a pytest plugin for measuring and tracking performance of Python code. It provides benchmarking capabilities with support for both wall-time and CPU instrumentation measurements.
Installation
uv add --dev pytest-codspeed
Example Usage
$ pytest tests/ --codspeed
============================= test session starts ====================
platform darwin -- Python 3.13.0, pytest-7.4.4, pluggy-1.5.0
codspeed: 3.0.0 (enabled, mode: walltime, timer_resolution: 41.7ns)
rootdir: /home/user/codspeed-test, configfile: pytest.ini
plugins: codspeed-3.0.0
collected 1 items
tests/test_sum_squares.py . [ 100%]
Benchmark Results
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┣━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━━━━╋━━━━━━━━━━╋━━━━━━━━┫
┃test_sum_squares┃ 1,873ns ┃ 4.8% ┃ 3.00s ┃ 66,930 ┃
┗━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━┻━━━━━━━━━━━━━┻━━━━━━━━━━┻━━━━━━━━┛
=============================== 1 benchmarked ========================
=============================== 1 passed in 4.12s ====================
Command Line Options
Enable the CodSpeed benchmarking plugin for the test session.(This is automatically enabled when running under the CodSpeed runner or from the GitHub action)
--codspeed-mode
auto | instrumentation | walltime
default:"auto"
The measurement instrument to use for measuring performance.
The time to warm up the benchmark for (in seconds), only for walltime mode.
The maximum time to run a benchmark for (in seconds), only for walltime mode.
The maximum number of rounds to run a benchmark for, only for walltime mode.
All of these walltime-specific command line options can be overridden by more specific
settings set by the benchmark marker.For example, if you set warmup_time
in the benchmark marker, it will take precedence
over the --codspeed-warmup-time
command line option.
Creating Benchmarks
There are multiple ways to mark tests as benchmarks at different levels:
The pytest.mark.benchmark
marker
Marking a test with the pytest.mark.benchmark
marker will automatically mark it as a
benchmark. This means that the entire test function will be measured. For more
fine-grained control, see the benchmark fixture section.
import pytest
@pytest.mark.benchmark
def test_sum_squares():
input = [1, 2, 3, 4, 5]
output = sum(i**2 for i in input)
assert output == 55
You can also mark all the tests contained in a test file as benchmarks by using the
pytestmark
variable at the module level.
import pytest
pytestmark = pytest.mark.benchmark
def test_sum_squares():
input = [1, 2, 3, 4, 5]
output = sum(i**2 for i in input)
assert output == 55
def test_sum_cubes():
input = [1, 2, 3, 4, 5]
output = sum(i**3 for i in input)
assert output == 225
The benchmark
fixture
When more fine-grained control is needed, the benchmark
fixture can be used. This
fixture is exposed by the pytest-codspeed
plugin, allowing to select exactly the code
to be measured.
A fixture is a function that can be used to set up and tear down the state of a test.
More information about fixtures can be found in the pytest documentation.
Direct invocation
The fixture can be used directly in the test function:
def test_sum_squares(benchmark):
data = [1, 2, 3, 4, 5]
benchmark(sum, data) # Only the `sum` function is measured
The fixture behaves as an identity function: calling benchmark(target, *args, **kwargs)
will have the same effect as calling target(*args, **kwargs)
. The return value will
also be passed along to make it possible to write assertions on the result.
For example:
def test_sum_squares(benchmark):
input = [1, 2, 3, 4, 5]
output = benchmark(sum, [i**2 for i in input])
assert output == 55
It’s also possible to use it with lambda functions:
def test_sum_squares(benchmark):
input = [1, 2, 3, 4, 5]
output = benchmark(lambda: sum(i**2 for i in input))
assert output == 55
As a decorator
If you want to measure a block of code containing multiple function calls, you can use
the fixture as a decorator:
def test_sum_squares_cubes(benchmark):
input = [1, 2, 3, 4, 5]
@benchmark
def measured_function():
squares = sum(i**2 for i in input)
cubes = sum(i**3 for i in input)
return squares + cubes
When using the fixture, the marker is not necessary anymore, except if you want to
customize the execution.
The benchmark fixture can only be used once per test function.For example, the following code will raise an error:def test_invalid(benchmark):
benchmark(func1) # OK
benchmark(func2) # ERROR: RuntimeError
Benchmark options
The @pytest.mark.benchmark
marker accepts several options to customize the benchmark execution:
The group name to use for the benchmark. This can be useful to organize related benchmarks together.(Will be supported soon in the UI)
The minimum time of a round (in seconds). Only available in walltime mode.
The maximum time to run the benchmark for (in seconds). Only available in walltime mode.
The maximum number of rounds to run the benchmark for. Takes precedence over max_time. Only available in walltime mode.
Example usage:
@pytest.mark.benchmark(
group="sorting",
min_time=0.1,
max_time=1.0,
max_rounds=100
)
def test_sorting_algorithm(benchmark):
data = [1, 2, 3, 4, 5]
benchmark(quicksort, data)
The min_time
, max_time
and max_rounds
options are only available in walltime mode.
When using instrumentation mode (Valgrind), these options are ignored.
Pedantic mode (advanced)
For fine-grained control over benchmark execution protocol, you can use the
benchmark.pedantic
method.
For example:
def test_pedantic_mode(benchmark):
def setup():
# Setup code that shouldn't be measured
data = list(range(1000))
return (data,), {} # Returns (args, kwargs) for target
def target(data):
# Code to benchmark
return sorted(data)
def teardown(data):
# Cleanup code that shouldn't be measured
data.clear()
result = benchmark.pedantic(
target,
setup=setup,
teardown=teardown,
rounds=5, # Number of rounds to run
warmup_rounds=1 # Number of warmup rounds
)
The benchmark.pedantic
method accepts the following parameters:
The function to benchmark. This is the main code that will be measured.
args
tuple[Any, ...]
default:"()"
Positional arguments to pass to the target function.
kwargs
dict[str, Any]
default:"{}"
Keyword arguments to pass to the target function.
setup
Callable | None
default:"None"
Optional setup function that runs before each round.
If it returns a tuple of (args, kwargs), these will be passed to the target function.
teardown
Callable | None
default:"None"
Optional teardown function that runs after each round. Receives the same arguments as the target function.
Number of warmup rounds to run before the actual benchmark. These rounds are not included in the measurements.
Number of rounds to run the benchmark for.This parameter is ignored when using the CPU instrumentation mode.
Number of iterations to run within each round. The total number of executions will be rounds×iterations.This parameter is ignored when using the CPU instrumentation mode.
Recipes
Parametrized benchmarks
pytest-codspeed
fully supports pytest’s parametrization out of the box:
import pytest
@pytest.mark.parametrize("size", [10, 100, 1000])
def test_parametrized_benchmark(benchmark, size):
data = list(range(size))
benchmark(sum, data)
Compatibility
pytest-codspeed
is designed to be fully backward compatible with
pytest-benchmark.
You can use both plugins in the same project, though only one will be active at
a time.
Running the benchmarks continuously
To run the benchmarks continuously in your CI, you can use pytest-codspeed
along with the CodSpeed runner.
We have first-class support for the following CI providers: