> ## Documentation Index > Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt > Use this file to discover all available pages before exploring further. # How to Benchmark Python with pytest? > Learn how to measure the performance of your Python code by writing and running benchmarks locally and continuously in CI to catch regressions. export const CIWorkflow = ({minimal = false, enableWorkflowDispatch = true, runsOn = "ubuntu-latest", highlight = [], mode, modes, submodules = false, preSteps = [], buildSteps = ["# ...", "# Setup your environment here:", "# - Configure your Python/Rust/Node version", "# - Install your dependencies", "# - Build your benchmarks (if using a compiled language)", "# ..."], benchmarkCommand = [""], jobName = "Run benchmarks", env = {}}) => { const modeList = modes || (mode ? [mode] : undefined); if (!modeList || modeList.length === 0) { throw new Error("mode or modes is required"); } const indent = (lines, depth) => { const reindentedLines = lines.map(l => l.length === 0 ? l : (" ").repeat(depth) + l); return reindentedLines.join("\n"); }; const workflowDispatchSection = enableWorkflowDispatch ? " # `workflow_dispatch` allows CodSpeed to trigger backtest\n" + " # performance analysis in order to generate initial data.\n" + " workflow_dispatch:\n" : ""; let yaml = ""; if (!minimal) { yaml += ` name: CodSpeed Benchmarks on: push: branches: - "main" # or "master" pull_request: `; yaml += workflowDispatchSection; } yaml += ` jobs: benchmarks: name: ${jobName} runs-on: ${runsOn}`; if (!minimal) { yaml += ` permissions: # optional for public repositories contents: read # required for actions/checkout id-token: write # required for OIDC authentication with CodSpeed`; } if (preSteps.length > 0) yaml += "\n" + indent(preSteps, 4); yaml += ` steps: - uses: actions/checkout@v5`; if (submodules) { const value = typeof submodules === "string" ? submodules : "true"; yaml += `\n with:\n submodules: ${value}`; } yaml += "\n" + indent(buildSteps, 6); const modeValue = modeList.join(","); yaml += ` - name: Run the benchmarks uses: CodSpeedHQ/action@v4 with: mode: ${modeValue}`; if (benchmarkCommand.length > 0) { const indentedBenchCommand = benchmarkCommand.length > 1 ? benchmarkCommand[0] + "\n" + indent(benchmarkCommand.slice(1), 12) : benchmarkCommand; const runLine = indent(["run: "], 10) + indentedBenchCommand; yaml += `\n${runLine}`; } const envEntries = Object.entries(env); if (envEntries.length > 0) { const envLines = ["env:", ...envEntries.map(([k, v]) => ` ${k}: ${v}`)]; yaml += "\n" + indent(envLines, 8); } return {yaml} ; }; export const PythonIcon = props => ; ## Why pytest-codspeed? This guide uses [`pytest-codspeed`](https://github.com/CodSpeedHQ/pytest-codspeed) because it integrates seamlessly with [`pytest`](https://docs.pytest.org/), the most popular Python testing framework. Your benchmarks live right alongside your tests using the same familiar syntax, no separate infrastructure to maintain. Plus, all of `pytest`'s ecosystem (parametrization, fixtures, plugins) works seamlessly with your benchmarks. You can even turn existing tests into benchmarks by adding a single decorator. If you're wondering whether to use command-line tools like `time` or `hyperfine` versus integrated frameworks like `pytest-codspeed`, check out our [Choosing the Right Python Benchmarking Strategy guide](/guides/choosing-the-correct-python-benchmarking-strategy) for a detailed comparison. ## Your First Benchmark Let's start by creating a simple benchmark for a recursive Fibonacci function. ### Installation First, add `pytest-codspeed` to your project's dependencies using [`uv`](https://docs.astral.sh/uv/): ```bash icon="square-terminal" theme={null} uv add --dev pytest-codspeed ``` **Don't have `uv`?** You can use `pip install pytest-codspeed` instead. `uv` is a modern, fast Python package manager that we recommend for new projects, but any package manager works fine. ### Writing the Benchmark Create a new file `tests/test_benchmarks.py`: ```python tests/test_benchmarks.py icon="python" theme={null} import pytest # Define the function we want to benchmark def fibonacci(n: int) -> int: if n <= 1: return n else: return fibonacci(n - 2) + fibonacci(n - 1) # Register a simple benchmark using the pytest marker @pytest.mark.benchmark def test_fib_bench(): result = fibonacci(30) assert result == 832040 ``` A few things to note: `@pytest.mark.benchmark` is a standard [`pytest` marker](https://docs.pytest.org/en/stable/how-to/mark.html) that marks this test as a benchmark. The entire test function is measured, including both the computation and the assertion. It's a regular `pytest` test, so you can run it with `pytest` as usual. The test validates correctness (via assertions) and tracks performance at the same time. ### Running the Benchmark Now run your benchmark: ```bash icon="square-terminal" theme={null} uv run pytest tests/ --codspeed ``` **What does `--codspeed` do?** This flag activates CodSpeed's benchmarking engine to collect performance measurements. Without it, pytest runs your tests normally without gathering performance data. If you're not using `uv`, run `pytest tests/ --codspeed` instead. You should see output like this: ```shellsession title=terminal icon="square-terminal" theme={null} =============================== test session starts =============================== platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0 codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns) CodSpeed had to disable the following plugins: pytest-benchmark benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python configfile: pyproject.toml plugins: benchmark-5.2.1, codspeed-4.2.0 collected 1 item tests/test_benchmarks.py . [100%] Benchmark Results ┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓ ┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃ ┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩ │ test_fib_bench │ 73.1ms │ 2.1% │ 2.96s │ 40 │ └────────────────┴─────────────┴─────────────┴──────────┴───────┘ ================================== 1 benchmarked ================================== ================================ 1 passed in 4.09s ================================ ``` The output shows that `test_fib_bench` takes about 73 milliseconds to compute `fibonacci(30)`. It ran 40 times in 2.96 seconds to get a reliable measurement. **Understanding the results:** * **Time (best)**: The fastest single iteration - this is your function's performance (lower is better). * **Rel. StdDev**: Relative standard deviation - measures consistency between runs (lower means more reliable results). * **Run time**: Total time spent running the benchmark. * **Iters**: How many times your code ran - automatically adjusted based on speed (fast code runs more times for accuracy). ## Benchmarking with Arguments So far, we've only tested our function with a single input value (30). But what if we want to see how performance changes with different input sizes? This is where `pytest`'s [`@pytest.mark.parametrize`](https://docs.pytest.org/en/stable/how-to/parametrize.html) comes in, and it works seamlessly with benchmarks. Let's update our benchmark to test multiple input sizes: ```python tests/test_benchmarks.py icon="python" theme={null} @pytest.mark.benchmark @pytest.mark.parametrize("n", [5, 10, 15, 20, 30]) def test_fib_parametrized(n): result = fibonacci(n) assert result > 0 ``` When you run this benchmark, pytest will create separate test instances for each parameter value, allowing you to compare performance across different inputs: ```shellsession title=terminal icon="square-terminal" theme={null} =============================== test session starts =============================== platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0 codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns) CodSpeed had to disable the following plugins: pytest-benchmark benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python configfile: pyproject.toml plugins: benchmark-5.2.1, codspeed-4.2.0 collected 5 items tests/test_benchmarks.py ..... [100%] Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓ ┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩ │ test_fib_parametrized[5] │ 0ns │ 1.7% │ 2.92s │ 1,026,802 │ │ test_fib_parametrized[10] │ 1ns │ 1.7% │ 2.89s │ 395,754 │ │ test_fib_parametrized[15] │ 76ns │ 0.8% │ 2.94s │ 52,256 │ │ test_fib_parametrized[20] │ 8.49µs │ 3.6% │ 3.00s │ 4,970 │ │ test_fib_parametrized[30] │ 72.9ms │ 0.7% │ 2.94s │ 40 │ └───────────────────────────┴─────────────┴─────────────┴──────────┴───────────┘ ================================== 5 benchmarked ================================== =============================== 5 passed in 19.88s ================================ ``` Notice how parametrization creates five separate benchmarks, one for each input value. The results reveal the exponential time complexity of our recursive Fibonacci implementation: `fibonacci(5)` takes virtually no time (0ns) and runs over 1 million iterations, while `fibonacci(30)` takes 72.9ms and runs only 40 times. This dramatic difference (from nanoseconds to milliseconds) demonstrates how quickly recursive Fibonacci becomes expensive as the input grows. ### Multiple Parameters You can also benchmark across multiple dimensions: ```python tests/test_benchmarks.py icon="python" theme={null} def fibonacci_iterative(n: int) -> int: if n <= 1: return 1 a, b = 1, 1 for _ in range(n - 1): a, b = b, a + b return b @pytest.mark.benchmark @pytest.mark.parametrize("algorithm, n", [ ("recursive", 10), ("recursive", 20), ("iterative", 100), ("iterative", 200), ]) def test_fib_algorithms(algorithm, n): if algorithm == "recursive": result = fibonacci(n) else: result = fibonacci_iterative(n) assert result > 0 ``` Then run it: ```shellsession title=terminal icon="square-terminal" theme={null} =============================== test session starts =============================== platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0 codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns) CodSpeed had to disable the following plugins: pytest-benchmark benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python configfile: pyproject.toml plugins: benchmark-5.2.1, codspeed-4.2.0 collected 4 items tests/test_benchmarks.py .... [100%] Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓ ┃ ┃ Time ┃ Rel. ┃ ┃ ┃ ┃ Benchmark ┃ (best) ┃ StdDev ┃ Run time ┃ Iters ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩ │ test_fib_algorithms[recursive-10] │ 1ns │ 1.0% │ 2.93s │ 614,789 │ │ test_fib_algorithms[recursive-20] │ 8.49µs │ 26.9% │ 3.01s │ 4,970 │ │ test_fib_algorithms[iterative-100] │ 0ns │ 42.1% │ 3.04s │ 1,474,1… │ │ test_fib_algorithms[iterative-200] │ 0ns │ 1.3% │ 2.29s │ 587,099 │ └────────────────────────────────────┴──────────┴───────────┴──────────┴──────────┘ ================================== 4 benchmarked ================================== =============================== 4 passed in 15.40s ================================ ``` This benchmark creates four separate test cases, one for each combination of algorithm and input size. The output clearly shows the dramatic performance difference between the two implementations: the iterative version handles much larger inputs (100, 200) in virtually no time, while the recursive version takes 8.49µs for `n=20`. Notice how `fibonacci_iterative(200)` runs over 500,000 iterations in the same time budget that `fibonacci(20)` only manages about 5,000. Parametrization makes algorithmic trade-offs visible at a glance, helping you choose the most efficient implementation for your use case. ### Naming Parametrized Cases By default, `pytest` generates benchmark names from the parameter values. That works well for primitives like numbers or short strings, e.g., `test_fib_parametrized[5]`. With richer parameters such as dictionaries, lists, or `callable` objects, the auto-generated names degrade into opaque labels like `test_my_bench[param0-param1]`. They are hard to read, and if the underlying values change between runs, CodSpeed treats each case as a new benchmark and loses the historical comparison. Use the `ids` argument to attach a stable, descriptive label to each case: ```python tests/test_benchmarks.py icon="python" theme={null} @pytest.mark.benchmark @pytest.mark.parametrize( "n", [5, 10, 15, 20, 30], ids=["tiny", "small", "medium", "large", "huge"], ) def test_fib_named(n): result = fibonacci(n) assert result > 0 ``` The benchmark output now reads `test_fib_named[tiny]` through `test_fib_named[huge]`, which is easier to scan and stays stable even if you tweak the parameter values later. For finer-grained control, wrap individual cases in `pytest.param` to attach an id one at a time: ```python tests/test_benchmarks.py icon="python" theme={null} @pytest.mark.benchmark @pytest.mark.parametrize("payload", [ pytest.param({"users": 100}, id="small-payload"), pytest.param({"users": 10_000}, id="large-payload"), ]) def test_serialize(payload): serialize(payload) ``` This form is especially useful when parameters are dictionaries, `dataclass` instances, or other non-trivial objects that `pytest` cannot turn into readable ids on its own. `pytest.param` also works with multiple parameters. Pass one positional value per name in the `parametrize` declaration, then attach a single `id` that describes the whole case: ```python tests/test_benchmarks.py icon="python" theme={null} @pytest.mark.benchmark @pytest.mark.parametrize( "algorithm, n", [ pytest.param("recursive", 10, id="recursive-small"), pytest.param("recursive", 20, id="recursive-large"), pytest.param("iterative", 100, id="iterative-small"), pytest.param("iterative", 200, id="iterative-large"), ], ) def test_fib_algorithms(algorithm, n): if algorithm == "recursive": result = fibonacci(n) else: result = fibonacci_iterative(n) assert result > 0 ``` The output now reads `test_fib_algorithms[recursive-small]` instead of the default `test_fib_algorithms[recursive-10]`, so the benchmark name stays stable even if you later change `10` to `15` to keep run times in a useful range. Pick ids that describe the scenario, not the raw value. `["cold-cache", "warm-cache"]` tells you more about what is being measured than `[0, 1]`, and it stays meaningful when the underlying values change. ## Benchmarking Only What Matters Sometimes, you have expensive setup that shouldn't be included in your benchmark measurements. For example, generating large datasets, creating complex data structures, or preparing test data. This is where the `benchmark` [`fixture`](https://docs.pytest.org/en/stable/how-to/fixtures.html) comes in. The `benchmark` fixture gives you precise control over what gets measured. Let's benchmark a data analysis function that identifies outliers in numerical data. The expensive part is generating the test dataset, but we only want to measure the outlier detection algorithm: ```python tests/test_outlier_detection.py icon="python" theme={null} import pytest import random def generate_dataset(size: int) -> list[float]: """Generate a large dataset with some outliers (expensive operation).""" random.seed(42) # Fixed seed for reproducibility data = [] for _ in range(size): # 95% normal values from a normal distribution if random.random() < 0.95: data.append(random.gauss(100.0, 15.0)) else: # 5% outliers data.append(random.uniform(200.0, 300.0)) return data def detect_outliers(data: list[float], threshold: float = 2.0) -> list[int]: """Detect outliers using z-score method (what we want to benchmark).""" # Calculate mean mean = sum(data) / len(data) # Calculate standard deviation variance = sum((x - mean) ** 2 for x in data) / len(data) std_dev = variance ** 0.5 # Find outliers outliers = [] for i, value in enumerate(data): z_score = abs((value - mean) / std_dev) if std_dev > 0 else 0 if z_score > threshold: outliers.append(i) return outliers # Benchmark for dataset generation @pytest.mark.benchmark @pytest.mark.parametrize("size", [10_000, 100_000, 1_000_000]) def test_generate_dataset(size): generate_dataset(size) # Benchmark for outlier detection only @pytest.mark.parametrize("size", [10_000, 100_000, 1_000_000]) def test_outlier_detection(benchmark, size): # NOT MEASURED: Expensive setup - generate large dataset dataset = generate_dataset(size) # MEASURED: Only the outlier detection algorithm result = benchmark(detect_outliers, dataset) # NOT MEASURED: Assertions assert len(result) > 0 # We should find some outliers assert all(isinstance(idx, int) for idx in result) ``` The setup code (generating the dataset) runs once, and **only** the `detect_outliers()` call inside `benchmark()` is measured. This gives you accurate performance data without the noise of test setup. Run this benchmark by filtering the `pytest` command to this file: ```bash icon="square-terminal" theme={null} uv run pytest tests/test_outlier_detection.py --codspeed ``` You should see output like this: ```shellsession title=terminal icon="square-terminal" theme={null} =============================== test session starts =============================== platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0 codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns) CodSpeed had to disable the following plugins: pytest-benchmark benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python configfile: pyproject.toml plugins: benchmark-5.2.1, codspeed-4.2.0 collected 6 items tests/test_outlier_detection.py ...... [100%] Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓ ┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩ │ test_generate_dataset[10000] │ 124.29µs │ 4.4% │ 2.92s │ 1,278 │ │ test_generate_dataset[100000] │ 11.0ms │ 6.2% │ 2.96s │ 130 │ │ test_generate_dataset[1000000] │ 225.5ms │ 29.6% │ 3.04s │ 13 │ │ test_outlier_detection[10000] │ 46.01µs │ 18.4% │ 2.89s │ 2,059 │ │ test_outlier_detection[100000] │ 3.3ms │ 6.1% │ 3.04s │ 220 │ │ test_outlier_detection[1000000] │ 132.4ms │ 12.6% │ 3.04s │ 22 │ └─────────────────────────────────┴─────────────┴─────────────┴──────────┴───────┘ ================================== 6 benchmarked ================================== =============================== 6 passed in 24.84s ================================ ``` The results reveal a crucial insight about what we're actually measuring. Notice the dramatic difference between the two benchmark groups: **Dataset generation** (`test_generate_dataset`): * 10k elements: 124.29µs * 100k elements: 11.0ms (88x slower) * 1M elements: 225.5ms (1,814x slower than 10k) **Outlier detection** (`test_outlier_detection`): * 10k elements: 46.01µs * 100k elements: 3.3ms (72x slower) * 1M elements: 132.4ms (2,878x slower than 10k) This comparison shows that for the 1M element dataset, **dataset generation takes 225.5ms while outlier detection takes 132.4ms**, the setup is actually slower than the algorithm we want to measure. Without using the `benchmark` fixture to exclude the setup, our measurements would include both operations, making it impossible to understand the true performance of the outlier detection algorithm. The `benchmark` fixture ensures we **measure only what matters: the algorithm itself**, not the test infrastructure around it. ## Additional Techniques ### Marking an Entire Module If you have a dedicated benchmarks file, you can mark all tests as benchmarks at once using `pytest`'s module-level marking: ```python tests/benchmarks/test_math_operations.py icon="python" theme={null} import pytest # Mark all tests in this module as benchmarks pytestmark = pytest.mark.benchmark def test_sum_squares(): # MEASURED: Everything in this test result = sum(i**2 for i in range(1000)) assert result > 0 def test_sum_cubes(): # MEASURED: Everything in this test result = sum(i**3 for i in range(1000)) assert result > 0 ``` Now all tests in this file are automatically benchmarked without individual decorators. This is useful for benchmark-specific test files. ### Fine-Grained Control with Pedantic For maximum control over your benchmarks, use [`benchmark.pedantic()`](https://pytest-benchmark.readthedocs.io/en/latest/pedantic.html). This allows you to specify custom setup and teardown functions, control the number of rounds and iterations, configure warmup behavior, and more: ```python tests/test_advanced.py icon="python" theme={null} import json import pytest def parse_json_data(json_string: str) -> dict: """Parse JSON string into a dictionary.""" return json.loads(json_string) @pytest.mark.parametrize("size", [10_000, 30_000]) def test_json_parsing(benchmark, size): # NOT MEASURED: Setup to create test data items = [{"id": i, "name": f"item_{i}", "value": i * 10} for i in range(size)] json_string = json.dumps(items) # MEASURED: Only the parse_json_data() function result = benchmark.pedantic( parse_json_data, # Function to benchmark args=(json_string,), # Arguments to the function rounds=100, # Number of benchmark rounds iterations=10, # Iterations per round warmup_rounds=2 # Warmup rounds before measuring ) # NOT MEASURED: The assertion assert len(result) == size ``` Here is the output when you run this benchmark: ```shellsession title=terminal icon="square-terminal" theme={null} =============================== test session starts =============================== platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0 codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns) CodSpeed had to disable the following plugins: pytest-benchmark benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python configfile: pyproject.toml plugins: benchmark-5.2.1, codspeed-4.2.0 collected 2 items tests/test_advanced.py .. [100%] Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓ ┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩ │ test_json_parsing[10000] │ 294.85µs │ 0.9% │ 2.99s │ 1,000 │ │ test_json_parsing[30000] │ 973.01µs │ 0.7% │ 9.88s │ 1,000 │ └──────────────────────────┴─────────────┴─────────────┴──────────┴───────┘ ================================== 2 benchmarked ================================== =============================== 2 passed in 13.20s ================================ ``` We can see that as expected each benchmark ran 100 rounds of 10 iterations each, totalling 1,000 iterations. Using `benchmark.pedantic()` is especially useful for bigger benchmarks where you need precise control over rounds, iterations, and warmup behavior. ### Benchmarking Async Functions To benchmark asynchronous functions, we can use the `benchmark` fixture along with `asyncio.run()` on a synchronous sub-function that calls our async code. Here's an example: ```python tests/test_async.py icon="python" theme={null} import asyncio import pytest async def simple_async_task() -> int: """A simple async task that simulates work""" await asyncio.sleep(0.1) # simulates async work for 100 ms return 42 @pytest.mark.benchmark def test_simple_async_task(benchmark): """Benchmark a simple async task""" result = benchmark(lambda: asyncio.run(simple_async_task())) assert result == 42 ``` Here is the output when you run this benchmark: ```shellsession title=terminal icon="square-terminal" theme={null} =============================== test session starts =============================== platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0 codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns) CodSpeed had to disable the following plugins: pytest-benchmark benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000) rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python configfile: pyproject.toml plugins: benchmark-5.2.1, codspeed-4.2.0 collected 1 item tests/test_async.py . [100%] Benchmark Results ┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓ ┃ Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩ │ test_simple_async_task │ 100.8ms │ 1.0% │ 2.95s │ 29 │ └────────────────────────┴─────────────┴─────────────┴──────────┴───────┘ ================================== 1 benchmarked ================================== ================================ 1 passed in 4.12s ================================ ``` Since asynchronous functions most likely involve I/O operations, their execution time can vary significantly based on external factors like network latency or disk speed. When benchmarking async code, consider running more iterations or rounds to obtain reliable measurements. If you are using CodSpeed in your CI to run your benchmarks, be sure to use the [Walltime instrument](/instruments/walltime) to get accurate timing for async operations. ## Best Practices ### Use Assertions to Verify Correctness Since benchmarks are regular `pytest` tests, they should include assertions to verify correctness: ```python icon="python" theme={null} # ❌ BAD: No verification @pytest.mark.benchmark def test_computation(): result = expensive_computation() # Oops, forgot to check if result is correct! # ✅ GOOD: Verify the result without measuring the assertion def test_computation(benchmark): result = benchmark(expensive_computation) assert result == expected_value ``` This ensures you're benchmarking correct code, not broken code that happens to be fast. Or, as we briefly said in the introduction, you can turn existing tests into benchmarks by adding the `@pytest.mark.benchmark` decorator. ```python icon="python" theme={null} # Existing correctness test def test_sorting_algorithm(): data = [5, 2, 9, 1] result = sorting_algorithm(data) assert result == [1, 2, 5, 9] # Turn it into a benchmark using the benchmark fixture def test_sorting_algorithm(benchmark): data = [5, 2, 9, 1] result = benchmark(sorting_algorithm, data) assert result == [1, 2, 5, 9] ``` ### Keep Benchmarks Deterministic Your benchmarks should produce consistent results across runs: ```python icon="python" theme={null} # ❌ BAD: Non-deterministic due to random data def test_sort_random(benchmark): import random data = [random.randint(1, 1000) for _ in range(100)] benchmark(sorted, data) # ✅ GOOD: Use a fixed seed or deterministic data def test_sort_deterministic(benchmark): import random random.seed(42) # Fixed seed for reproducibility data = [random.randint(1, 1000) for _ in range(100)] benchmark(sorted, data) # ✅ EVEN BETTER: Use deterministic data def test_sort_worst_case(benchmark): data = list(range(100, 0, -1)) # Always the same benchmark(sorted, data) ``` ### Benchmarking Your Own Package Following Python best practices, your source code should live in a `src/` directory. Here's a typical project structure: ```shellsession title=terminal icon="square-terminal" theme={null} my_project/ ├── pyproject.toml ├── src/ │ └── mylib/ │ ├── __init__.py │ └── algorithms.py └── tests/ ├── test_algorithms.py # Regular unit tests └── benchmarks/ # Performance benchmarks └── test_algorithm_performance.py ``` Your source code in `src/mylib/algorithms.py`: ```python src/mylib/algorithms.py icon="python" theme={null} def quick_sort(arr: list[int]) -> list[int]: if len(arr) <= 1: return arr pivot = arr[len(arr) // 2] left = [x for x in arr if x < pivot] middle = [x for x in arr if x == pivot] right = [x for x in arr if x > pivot] return quick_sort(left) + middle + quick_sort(right) ``` Then benchmark it in your tests: ```python tests/benchmarks/test_algorithm_performance.py icon="python" theme={null} from mylib.algorithms import quick_sort import pytest @pytest.mark.parametrize("size", [10, 100, 1000]) def test_quick_sort_performance(benchmark, size): # NOT MEASURED: Create test data data = list(range(size, 0, -1)) # MEASURED: The sorting algorithm result = benchmark(quick_sort, data) # NOT MEASURED: Verify correctness assert result == list(range(1, size + 1)) ``` Make sure your package is installed in development mode: ```bash icon="square-terminal" theme={null} uv pip install -e . ``` ## Running Benchmarks Continuously with CodSpeed So far, you've been running benchmarks locally. But local benchmarking has limitations: * **Inconsistent hardware**: Different developers get different results * **Manual process**: Easy to forget to run benchmarks before merging * **No historical tracking**: Hard to spot gradual performance degradation * **No PR context**: Can't see performance impact during code review This is where **CodSpeed** comes in. It runs your benchmarks automatically in CI and provides: * Automated performance regression detection in PRs * Consistent metrics with reliable measurements across all runs * Historical tracking to see performance over time with detailed charts * Flamegraph profiles to see exactly what changed in your code's execution For the full CodSpeed integration reference, see [Writing Benchmarks in Python](/benchmarks/python). ### How to set up CodSpeed with pytest-codspeed Here's how to integrate CodSpeed with your `pytest-codspeed` benchmarks: Create a workflow file to run benchmarks on every push and pull request. **Important**: Use `actions/setup-python` to set up Python, not `uv install`. This is required for CodSpeed's CPU simulation to work correctly. Once the workflow runs, your pull requests will receive a performance report comment: Pull Request Result

After your benchmarks run in CI, head over to your CodSpeed dashboard to see detailed performance reports, historical trends, and flamegraph profiles for deeper analysis. Profiling Report on CodSpeed

Profiling works out of the box, no extra configuration needed! [Learn more about flamegraphs and how to use them to optimize your code](/features/profiling). ## Next Steps Check out these resources to continue your Python benchmarking journey: Sign up and start tracking your Python performance in CI }> Explore the full pytest-codspeed API reference }> Learn when to use different Python benchmarking approaches Learn how to use flamegraphs to optimize your code