> ## Documentation Index
> Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Benchmark Python with pytest?

> Learn how to measure the performance of your Python code by writing and running benchmarks locally and continuously in CI to catch regressions.

export const CIWorkflow = ({minimal = false, enableWorkflowDispatch = true, runsOn = "ubuntu-latest", highlight = [], mode, modes, submodules = false, preSteps = [], buildSteps = ["# ...", "# Setup your environment here:", "#  - Configure your Python/Rust/Node version", "#  - Install your dependencies", "#  - Build your benchmarks (if using a compiled language)", "# ..."], benchmarkCommand = ["<Insert your benchmark command here>"], jobName = "Run benchmarks", env = {}}) => {
  const modeList = modes || (mode ? [mode] : undefined);
  if (!modeList || modeList.length === 0) {
    throw new Error("mode or modes is required");
  }
  const indent = (lines, depth) => {
    const reindentedLines = lines.map(l => l.length === 0 ? l : (" ").repeat(depth) + l);
    return reindentedLines.join("\n");
  };
  const workflowDispatchSection = enableWorkflowDispatch ? "  # `workflow_dispatch` allows CodSpeed to trigger backtest\n" + "  # performance analysis in order to generate initial data.\n" + "  workflow_dispatch:\n" : "";
  let yaml = "";
  if (!minimal) {
    yaml += `
name: CodSpeed Benchmarks

on:
  push:
    branches:
      - "main" # or "master"
  pull_request:
`;
    yaml += workflowDispatchSection;
  }
  yaml += `
jobs:
  benchmarks:
    name: ${jobName}
    runs-on: ${runsOn}`;
  if (!minimal) {
    yaml += `
    permissions: # optional for public repositories
      contents: read # required for actions/checkout
      id-token: write # required for OIDC authentication with CodSpeed`;
  }
  if (preSteps.length > 0) yaml += "\n" + indent(preSteps, 4);
  yaml += `
    steps:
      - uses: actions/checkout@v5`;
  if (submodules) {
    const value = typeof submodules === "string" ? submodules : "true";
    yaml += `\n        with:\n          submodules: ${value}`;
  }
  yaml += "\n" + indent(buildSteps, 6);
  const modeValue = modeList.join(",");
  yaml += `
      - name: Run the benchmarks
        uses: CodSpeedHQ/action@v4
        with:
          mode: ${modeValue}`;
  if (benchmarkCommand.length > 0) {
    const indentedBenchCommand = benchmarkCommand.length > 1 ? benchmarkCommand[0] + "\n" + indent(benchmarkCommand.slice(1), 12) : benchmarkCommand;
    const runLine = indent(["run: "], 10) + indentedBenchCommand;
    yaml += `\n${runLine}`;
  }
  const envEntries = Object.entries(env);
  if (envEntries.length > 0) {
    const envLines = ["env:", ...envEntries.map(([k, v]) => `  ${k}: ${v}`)];
    yaml += "\n" + indent(envLines, 8);
  }
  return <CodeBlock language="yaml" highlight={JSON.stringify(highlight)} {...minimal || ({
    filename: ".github/workflows/codspeed.yml",
    icon: "github"
  })}>
      {yaml}
    </CodeBlock>;
};

export const PythonIcon = props => <svg xmlns="http://www.w3.org/2000/svg" className="h-6 w-6" viewBox="0 0 32 31" width={32} height={31} fill="none" {...props}>
    <g clipPath="url(#a)">
      <g clipPath="url(#b)">
        <path fill="url(#c)" d="M15.612.207a19.07 19.07 0 0 0-3.214.276C9.552.99 9.035 2.05 9.035 4.005v2.582h6.726v.86h-9.25c-1.955 0-3.666 1.184-4.202 3.435-.617 2.58-.645 4.19 0 6.885.479 2.006 1.62 3.435 3.575 3.435h2.313v-3.095c0-2.236 1.92-4.209 4.201-4.209h6.718c1.87 0 3.363-1.55 3.363-3.442V4.005c0-1.836-1.537-3.215-3.363-3.522a20.83 20.83 0 0 0-3.504-.276Zm-3.637 2.076c.695 0 1.262.581 1.262 1.295 0 .712-.567 1.287-1.262 1.287a1.274 1.274 0 0 1-1.262-1.287c0-.714.565-1.295 1.262-1.295Z" />
        <path fill="url(#d)" d="M23.318 7.447v3.009c0 2.332-1.963 4.295-4.202 4.295h-6.718c-1.84 0-3.363 1.586-3.363 3.443v6.45c0 1.837 1.585 2.917 3.363 3.443 2.13.63 4.17.745 6.718 0 1.693-.494 3.363-1.487 3.363-3.442v-2.582h-6.718v-.861h10.081c1.955 0 2.683-1.373 3.363-3.435.702-2.122.672-4.163 0-6.885-.483-1.96-1.406-3.435-3.363-3.435h-2.524Zm-3.779 16.337c.698 0 1.263.575 1.263 1.287 0 .714-.565 1.295-1.262 1.295-.695 0-1.263-.58-1.263-1.295 0-.712.568-1.287 1.262-1.287Z" />
      </g>
    </g>
    <defs>
      <linearGradient id="c" x1={1.836} x2={17.439} y1={0.207} y2={13.407} gradientUnits="userSpaceOnUse">
        <stop stopColor="#5A9FD4" />
        <stop offset={1} stopColor="#306998" />
      </linearGradient>
      <linearGradient id="d" x1={19.378} x2={13.76} y1={24.854} y2={17.039} gradientUnits="userSpaceOnUse">
        <stop stopColor="#FFD43B" />
        <stop offset={1} stopColor="#FFE873" />
      </linearGradient>
      <clipPath id="a">
        <path fill="#fff" d="M.89.207h30.272v30.27H.89z" />
      </clipPath>
      <clipPath id="b">
        <path fill="#fff" d="M1.836.207h28.379v29.8H1.835z" />
      </clipPath>
    </defs>
  </svg>;

## Why pytest-codspeed?

This guide uses
[`pytest-codspeed`](https://github.com/CodSpeedHQ/pytest-codspeed) because it
integrates seamlessly with [`pytest`](https://docs.pytest.org/), the most
popular Python testing framework. Your benchmarks live right alongside your
tests using the same familiar syntax, no separate infrastructure to maintain.

Plus, all of `pytest`'s ecosystem (parametrization, fixtures, plugins) works
seamlessly with your benchmarks. You can even turn existing tests into
benchmarks by adding a single decorator.

<Info>
  If you're wondering whether to use command-line tools like `time` or
  `hyperfine` versus integrated frameworks like `pytest-codspeed`, check out our
  [Choosing the Right Python Benchmarking Strategy
  guide](/guides/choosing-the-correct-python-benchmarking-strategy) for a
  detailed comparison.
</Info>

## Your First Benchmark

Let's start by creating a simple benchmark for a recursive Fibonacci function.

### Installation

First, add `pytest-codspeed` to your project's dependencies using
[`uv`](https://docs.astral.sh/uv/):

```bash icon="square-terminal" theme={null}
uv add --dev pytest-codspeed
```

<Note>
  **Don't have `uv`?**

  You can use `pip install pytest-codspeed` instead. `uv` is a modern, fast Python
  package manager that we recommend for new projects, but any package manager
  works fine.
</Note>

### Writing the Benchmark

Create a new file `tests/test_benchmarks.py`:

```python tests/test_benchmarks.py icon="python" theme={null}
import pytest

# Define the function we want to benchmark
def fibonacci(n: int) -> int:
    if n <= 1:
        return n
    else:
        return fibonacci(n - 2) + fibonacci(n - 1)

# Register a simple benchmark using the pytest marker
@pytest.mark.benchmark
def test_fib_bench():
    result = fibonacci(30)
    assert result == 832040
```

A few things to note: `@pytest.mark.benchmark` is a standard
[`pytest` marker](https://docs.pytest.org/en/stable/how-to/mark.html) that marks
this test as a benchmark. The entire test function is measured, including both
the computation and the assertion. It's a regular `pytest` test, so you can run
it with `pytest` as usual. The test validates correctness (via assertions) and
tracks performance at the same time.

### Running the Benchmark

Now run your benchmark:

```bash icon="square-terminal" theme={null}
uv run pytest tests/ --codspeed
```

<Note>
  **What does `--codspeed` do?**

  This flag activates CodSpeed's benchmarking engine to collect performance
  measurements. Without it, pytest runs your tests normally without gathering
  performance data. If you're not using `uv`, run `pytest tests/ --codspeed`
  instead.
</Note>

You should see output like this:

```shellsession title=terminal icon="square-terminal" theme={null}
=============================== test session starts ===============================
platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0
codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns)
CodSpeed had to disable the following plugins: pytest-benchmark
benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python
configfile: pyproject.toml
plugins: benchmark-5.2.1, codspeed-4.2.0
collected 1 item

tests/test_benchmarks.py .                                                  [100%]

                        Benchmark Results
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃      Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_fib_bench │      73.1ms │        2.1% │    2.96s │    40 │
└────────────────┴─────────────┴─────────────┴──────────┴───────┘

================================== 1 benchmarked ==================================

================================ 1 passed in 4.09s ================================
```

The output shows that `test_fib_bench` takes about 73 milliseconds to compute
`fibonacci(30)`. It ran 40 times in 2.96 seconds to get a reliable measurement.

<Tip>
  **Understanding the results:**

  * **Time (best)**: The fastest single iteration - this is your function's
    performance (lower is better).
  * **Rel. StdDev**: Relative standard deviation - measures consistency between
    runs (lower means more reliable results).
  * **Run time**: Total time spent running the benchmark.
  * **Iters**: How many times your code ran - automatically adjusted based on
    speed (fast code runs more times for accuracy).
</Tip>

## Benchmarking with Arguments

So far, we've only tested our function with a single input value (30). But what
if we want to see how performance changes with different input sizes? This is
where `pytest`'s
[`@pytest.mark.parametrize`](https://docs.pytest.org/en/stable/how-to/parametrize.html)
comes in, and it works seamlessly with benchmarks.

Let's update our benchmark to test multiple input sizes:

```python tests/test_benchmarks.py icon="python" theme={null}
@pytest.mark.benchmark
@pytest.mark.parametrize("n", [5, 10, 15, 20, 30])
def test_fib_parametrized(n):
    result = fibonacci(n)
    assert result > 0
```

When you run this benchmark, pytest will create separate test instances for each
parameter value, allowing you to compare performance across different inputs:

```shellsession title=terminal icon="square-terminal" theme={null}
=============================== test session starts ===============================
platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0
codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns)
CodSpeed had to disable the following plugins: pytest-benchmark
benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python
configfile: pyproject.toml
plugins: benchmark-5.2.1, codspeed-4.2.0
collected 5 items

tests/test_benchmarks.py .....                                              [100%]

                               Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┓
┃                 Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃     Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━┩
│  test_fib_parametrized[5] │         0ns │        1.7% │    2.92s │ 1,026,802 │
│ test_fib_parametrized[10] │         1ns │        1.7% │    2.89s │   395,754 │
│ test_fib_parametrized[15] │        76ns │        0.8% │    2.94s │    52,256 │
│ test_fib_parametrized[20] │      8.49µs │        3.6% │    3.00s │     4,970 │
│ test_fib_parametrized[30] │      72.9ms │        0.7% │    2.94s │        40 │
└───────────────────────────┴─────────────┴─────────────┴──────────┴───────────┘

================================== 5 benchmarked ==================================

=============================== 5 passed in 19.88s ================================
```

Notice how parametrization creates five separate benchmarks, one for each input
value. The results reveal the exponential time complexity of our recursive
Fibonacci implementation: `fibonacci(5)` takes virtually no time (0ns) and runs
over 1 million iterations, while `fibonacci(30)` takes 72.9ms and runs only 40
times. This dramatic difference (from nanoseconds to milliseconds) demonstrates
how quickly recursive Fibonacci becomes expensive as the input grows.

### Multiple Parameters

You can also benchmark across multiple dimensions:

```python tests/test_benchmarks.py icon="python" theme={null}
def fibonacci_iterative(n: int) -> int:
    if n <= 1:
        return 1
    a, b = 1, 1
    for _ in range(n - 1):
        a, b = b, a + b
    return b

@pytest.mark.benchmark
@pytest.mark.parametrize("algorithm, n", [
    ("recursive", 10),
    ("recursive", 20),
    ("iterative", 100),
    ("iterative", 200),
])
def test_fib_algorithms(algorithm, n):
    if algorithm == "recursive":
        result = fibonacci(n)
    else:
        result = fibonacci_iterative(n)
    assert result > 0
```

Then run it:

```shellsession title=terminal icon="square-terminal" theme={null}
=============================== test session starts ===============================
platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0
codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns)
CodSpeed had to disable the following plugins: pytest-benchmark
benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python
configfile: pyproject.toml
plugins: benchmark-5.2.1, codspeed-4.2.0
collected 4 items

tests/test_benchmarks.py ....                                               [100%]

                                 Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┓
┃                                    ┃     Time ┃      Rel. ┃          ┃          ┃
┃                          Benchmark ┃   (best) ┃    StdDev ┃ Run time ┃    Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━┩
│  test_fib_algorithms[recursive-10] │      1ns │      1.0% │    2.93s │  614,789 │
│  test_fib_algorithms[recursive-20] │   8.49µs │     26.9% │    3.01s │    4,970 │
│ test_fib_algorithms[iterative-100] │      0ns │     42.1% │    3.04s │ 1,474,1… │
│ test_fib_algorithms[iterative-200] │      0ns │      1.3% │    2.29s │  587,099 │
└────────────────────────────────────┴──────────┴───────────┴──────────┴──────────┘

================================== 4 benchmarked ==================================

=============================== 4 passed in 15.40s ================================
```

This benchmark creates four separate test cases, one for each combination of
algorithm and input size. The output clearly shows the dramatic performance
difference between the two implementations: the iterative version handles much
larger inputs (100, 200) in virtually no time, while the recursive version takes
8.49µs for `n=20`. Notice how `fibonacci_iterative(200)` runs over 500,000
iterations in the same time budget that `fibonacci(20)` only manages about
5,000.

Parametrization makes algorithmic trade-offs visible at a glance, helping you
choose the most efficient implementation for your use case.

### Naming Parametrized Cases

By default, `pytest` generates benchmark names from the parameter values. That
works well for primitives like numbers or short strings, e.g.,
`test_fib_parametrized[5]`. With richer parameters such as dictionaries, lists,
or `callable` objects, the auto-generated names degrade into opaque labels like
`test_my_bench[param0-param1]`. They are hard to read, and if the underlying
values change between runs, CodSpeed treats each case as a new benchmark and
loses the historical comparison.

Use the `ids` argument to attach a stable, descriptive label to each case:

```python tests/test_benchmarks.py icon="python" theme={null}
@pytest.mark.benchmark
@pytest.mark.parametrize(
    "n",
    [5, 10, 15, 20, 30],
    ids=["tiny", "small", "medium", "large", "huge"],
)
def test_fib_named(n):
    result = fibonacci(n)
    assert result > 0
```

The benchmark output now reads `test_fib_named[tiny]` through
`test_fib_named[huge]`, which is easier to scan and stays stable even if you
tweak the parameter values later.

For finer-grained control, wrap individual cases in `pytest.param` to attach an
id one at a time:

```python tests/test_benchmarks.py icon="python" theme={null}
@pytest.mark.benchmark
@pytest.mark.parametrize("payload", [
    pytest.param({"users": 100}, id="small-payload"),
    pytest.param({"users": 10_000}, id="large-payload"),
])
def test_serialize(payload):
    serialize(payload)
```

This form is especially useful when parameters are dictionaries, `dataclass`
instances, or other non-trivial objects that `pytest` cannot turn into readable
ids on its own.

`pytest.param` also works with multiple parameters. Pass one positional value
per name in the `parametrize` declaration, then attach a single `id` that
describes the whole case:

```python tests/test_benchmarks.py icon="python" theme={null}
@pytest.mark.benchmark
@pytest.mark.parametrize(
    "algorithm, n",
    [
        pytest.param("recursive", 10, id="recursive-small"),
        pytest.param("recursive", 20, id="recursive-large"),
        pytest.param("iterative", 100, id="iterative-small"),
        pytest.param("iterative", 200, id="iterative-large"),
    ],
)
def test_fib_algorithms(algorithm, n):
    if algorithm == "recursive":
        result = fibonacci(n)
    else:
        result = fibonacci_iterative(n)
    assert result > 0
```

The output now reads `test_fib_algorithms[recursive-small]` instead of the
default `test_fib_algorithms[recursive-10]`, so the benchmark name stays stable
even if you later change `10` to `15` to keep run times in a useful range.

<Tip>
  Pick ids that describe the scenario, not the raw value. `["cold-cache",
      "warm-cache"]` tells you more about what is being measured than `[0, 1]`, and
  it stays meaningful when the underlying values change.
</Tip>

## Benchmarking Only What Matters

Sometimes, you have expensive setup that shouldn't be included in your benchmark
measurements. For example, generating large datasets, creating complex data
structures, or preparing test data. This is where the `benchmark`
[`fixture`](https://docs.pytest.org/en/stable/how-to/fixtures.html) comes in.

The `benchmark` fixture gives you precise control over what gets measured. Let's
benchmark a data analysis function that identifies outliers in numerical data.
The expensive part is generating the test dataset, but we only want to measure
the outlier detection algorithm:

```python tests/test_outlier_detection.py icon="python" theme={null}
import pytest
import random

def generate_dataset(size: int) -> list[float]:
    """Generate a large dataset with some outliers (expensive operation)."""
    random.seed(42)  # Fixed seed for reproducibility

    data = []
    for _ in range(size):
        # 95% normal values from a normal distribution
        if random.random() < 0.95:
            data.append(random.gauss(100.0, 15.0))
        else:
            # 5% outliers
            data.append(random.uniform(200.0, 300.0))

    return data

def detect_outliers(data: list[float], threshold: float = 2.0) -> list[int]:
    """Detect outliers using z-score method (what we want to benchmark)."""
    # Calculate mean
    mean = sum(data) / len(data)

    # Calculate standard deviation
    variance = sum((x - mean) ** 2 for x in data) / len(data)
    std_dev = variance ** 0.5

    # Find outliers
    outliers = []
    for i, value in enumerate(data):
        z_score = abs((value - mean) / std_dev) if std_dev > 0 else 0
        if z_score > threshold:
            outliers.append(i)

    return outliers

# Benchmark for dataset generation
@pytest.mark.benchmark
@pytest.mark.parametrize("size", [10_000, 100_000, 1_000_000])
def test_generate_dataset(size):
    generate_dataset(size)

# Benchmark for outlier detection only
@pytest.mark.parametrize("size", [10_000, 100_000, 1_000_000])
def test_outlier_detection(benchmark, size):
    # NOT MEASURED: Expensive setup - generate large dataset
    dataset = generate_dataset(size)

    # MEASURED: Only the outlier detection algorithm
    result = benchmark(detect_outliers, dataset)

    # NOT MEASURED: Assertions
    assert len(result) > 0  # We should find some outliers
    assert all(isinstance(idx, int) for idx in result)
```

The setup code (generating the dataset) runs once, and **only** the
`detect_outliers()` call inside `benchmark()` is measured. This gives you
accurate performance data without the noise of test setup.

Run this benchmark by filtering the `pytest` command to this file:

```bash icon="square-terminal" theme={null}
uv run pytest tests/test_outlier_detection.py --codspeed
```

You should see output like this:

```shellsession title=terminal icon="square-terminal" theme={null}
=============================== test session starts ===============================
platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0
codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns)
CodSpeed had to disable the following plugins: pytest-benchmark
benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python
configfile: pyproject.toml
plugins: benchmark-5.2.1, codspeed-4.2.0
collected 6 items

tests/test_outlier_detection.py ......                                      [100%]

                                Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃                       Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│    test_generate_dataset[10000] │    124.29µs │        4.4% │    2.92s │ 1,278 │
│   test_generate_dataset[100000] │      11.0ms │        6.2% │    2.96s │   130 │
│  test_generate_dataset[1000000] │     225.5ms │       29.6% │    3.04s │    13 │
│   test_outlier_detection[10000] │     46.01µs │       18.4% │    2.89s │ 2,059 │
│  test_outlier_detection[100000] │       3.3ms │        6.1% │    3.04s │   220 │
│ test_outlier_detection[1000000] │     132.4ms │       12.6% │    3.04s │    22 │
└─────────────────────────────────┴─────────────┴─────────────┴──────────┴───────┘

================================== 6 benchmarked ==================================

=============================== 6 passed in 24.84s ================================
```

The results reveal a crucial insight about what we're actually measuring. Notice
the dramatic difference between the two benchmark groups:

**Dataset generation** (`test_generate_dataset`):

* 10k elements: 124.29µs
* 100k elements: 11.0ms (88x slower)
* 1M elements: 225.5ms (1,814x slower than 10k)

**Outlier detection** (`test_outlier_detection`):

* 10k elements: 46.01µs
* 100k elements: 3.3ms (72x slower)
* 1M elements: 132.4ms (2,878x slower than 10k)

This comparison shows that for the 1M element dataset, **dataset generation
takes 225.5ms while outlier detection takes 132.4ms**, the setup is actually
slower than the algorithm we want to measure.

Without using the `benchmark` fixture to exclude the setup, our measurements
would include both operations, making it impossible to understand the true
performance of the outlier detection algorithm.

The `benchmark` fixture ensures we **measure only what matters: the algorithm
itself**, not the test infrastructure around it.

## Additional Techniques

### Marking an Entire Module

If you have a dedicated benchmarks file, you can mark all tests as benchmarks at
once using `pytest`'s module-level marking:

```python tests/benchmarks/test_math_operations.py icon="python" theme={null}
import pytest

# Mark all tests in this module as benchmarks
pytestmark = pytest.mark.benchmark

def test_sum_squares():
    # MEASURED: Everything in this test
    result = sum(i**2 for i in range(1000))
    assert result > 0

def test_sum_cubes():
    # MEASURED: Everything in this test
    result = sum(i**3 for i in range(1000))
    assert result > 0
```

Now all tests in this file are automatically benchmarked without individual
decorators. This is useful for benchmark-specific test files.

### Fine-Grained Control with Pedantic

For maximum control over your benchmarks, use
[`benchmark.pedantic()`](https://pytest-benchmark.readthedocs.io/en/latest/pedantic.html).
This allows you to specify custom setup and teardown functions, control the
number of rounds and iterations, configure warmup behavior, and more:

```python tests/test_advanced.py icon="python" theme={null}
import json
import pytest

def parse_json_data(json_string: str) -> dict:
    """Parse JSON string into a dictionary."""
    return json.loads(json_string)

@pytest.mark.parametrize("size", [10_000, 30_000])
def test_json_parsing(benchmark, size):
    # NOT MEASURED: Setup to create test data
    items = [{"id": i, "name": f"item_{i}", "value": i * 10} for i in range(size)]
    json_string = json.dumps(items)

    # MEASURED: Only the parse_json_data() function
    result = benchmark.pedantic(
        parse_json_data,        # Function to benchmark
        args=(json_string,),    # Arguments to the function
        rounds=100,             # Number of benchmark rounds
        iterations=10,          # Iterations per round
        warmup_rounds=2         # Warmup rounds before measuring
    )

    # NOT MEASURED: The assertion
    assert len(result) == size
```

Here is the output when you run this benchmark:

```shellsession title=terminal icon="square-terminal" theme={null}
=============================== test session starts ===============================
platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0
codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns)
CodSpeed had to disable the following plugins: pytest-benchmark
benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python
configfile: pyproject.toml
plugins: benchmark-5.2.1, codspeed-4.2.0
collected 2 items

tests/test_advanced.py ..                                                   [100%]

                             Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃                Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_json_parsing[10000] │    294.85µs │        0.9% │    2.99s │ 1,000 │
│ test_json_parsing[30000] │    973.01µs │        0.7% │    9.88s │ 1,000 │
└──────────────────────────┴─────────────┴─────────────┴──────────┴───────┘

================================== 2 benchmarked ==================================

=============================== 2 passed in 13.20s ================================
```

We can see that as expected each benchmark ran 100 rounds of 10 iterations each,
totalling 1,000 iterations.

Using `benchmark.pedantic()` is especially useful for bigger benchmarks where
you need precise control over rounds, iterations, and warmup behavior.

### Benchmarking Async Functions

To benchmark asynchronous functions, we can use the `benchmark` fixture along
with `asyncio.run()` on a synchronous sub-function that calls our async code.
Here's an example:

```python tests/test_async.py icon="python" theme={null}
import asyncio

import pytest


async def simple_async_task() -> int:
    """A simple async task that simulates work"""
    await asyncio.sleep(0.1)  # simulates async work for 100 ms
    return 42


@pytest.mark.benchmark
def test_simple_async_task(benchmark):
    """Benchmark a simple async task"""

    result = benchmark(lambda: asyncio.run(simple_async_task()))
    assert result == 42
```

Here is the output when you run this benchmark:

```shellsession title=terminal icon="square-terminal" theme={null}
=============================== test session starts ===============================
platform darwin -- Python 3.13.3, pytest-8.4.2, pluggy-1.6.0
codspeed: 4.2.0 (enabled, mode: walltime, callgraph: not supported, timer_resolution: 41.7ns)
CodSpeed had to disable the following plugins: pytest-benchmark
benchmark: 5.2.1 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/user/projects/CodSpeedHQ/docs-guides/python
configfile: pyproject.toml
plugins: benchmark-5.2.1, codspeed-4.2.0
collected 1 item

tests/test_async.py .                                                        [100%]

                            Benchmark Results
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━┓
┃              Benchmark ┃ Time (best) ┃ Rel. StdDev ┃ Run time ┃ Iters ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━┩
│ test_simple_async_task │     100.8ms │        1.0% │    2.95s │    29 │
└────────────────────────┴─────────────┴─────────────┴──────────┴───────┘

================================== 1 benchmarked ==================================

================================ 1 passed in 4.12s ================================
```

<Tip>
  Since asynchronous functions most likely involve I/O operations, their execution
  time can vary significantly based on external factors like network latency or
  disk speed. When benchmarking async code, consider running more iterations or
  rounds to obtain reliable measurements.

  If you are using CodSpeed in your CI to run your benchmarks, be sure to use the
  [Walltime instrument](/instruments/walltime) to get accurate timing for async
  operations.
</Tip>

## Best Practices

### Use Assertions to Verify Correctness

Since benchmarks are regular `pytest` tests, they should include assertions to
verify correctness:

```python icon="python" theme={null}
# ❌ BAD: No verification
@pytest.mark.benchmark
def test_computation():
    result = expensive_computation()
    # Oops, forgot to check if result is correct!

# ✅ GOOD: Verify the result without measuring the assertion
def test_computation(benchmark):
    result = benchmark(expensive_computation)
    assert result == expected_value
```

This ensures you're benchmarking correct code, not broken code that happens to
be fast.

Or, as we briefly said in the introduction, you can turn existing tests into
benchmarks by adding the `@pytest.mark.benchmark` decorator.

```python icon="python" theme={null}
# Existing correctness test
def test_sorting_algorithm():
    data = [5, 2, 9, 1]
    result = sorting_algorithm(data)
    assert result == [1, 2, 5, 9]

# Turn it into a benchmark using the benchmark fixture
def test_sorting_algorithm(benchmark):
    data = [5, 2, 9, 1]
    result = benchmark(sorting_algorithm, data)
    assert result == [1, 2, 5, 9]
```

### Keep Benchmarks Deterministic

Your benchmarks should produce consistent results across runs:

```python icon="python" theme={null}
# ❌ BAD: Non-deterministic due to random data
def test_sort_random(benchmark):
    import random
    data = [random.randint(1, 1000) for _ in range(100)]
    benchmark(sorted, data)

# ✅ GOOD: Use a fixed seed or deterministic data
def test_sort_deterministic(benchmark):
    import random
    random.seed(42)  # Fixed seed for reproducibility
    data = [random.randint(1, 1000) for _ in range(100)]
    benchmark(sorted, data)

# ✅ EVEN BETTER: Use deterministic data
def test_sort_worst_case(benchmark):
    data = list(range(100, 0, -1))  # Always the same
    benchmark(sorted, data)
```

### Benchmarking Your Own Package

Following Python best practices, your source code should live in a `src/`
directory. Here's a typical project structure:

```shellsession title=terminal icon="square-terminal" theme={null}
my_project/
├── pyproject.toml
├── src/
│   └── mylib/
│       ├── __init__.py
│       └── algorithms.py
└── tests/
    ├── test_algorithms.py        # Regular unit tests
    └── benchmarks/               # Performance benchmarks
        └── test_algorithm_performance.py
```

Your source code in `src/mylib/algorithms.py`:

```python src/mylib/algorithms.py icon="python" theme={null}
def quick_sort(arr: list[int]) -> list[int]:
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) // 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quick_sort(left) + middle + quick_sort(right)
```

Then benchmark it in your tests:

```python tests/benchmarks/test_algorithm_performance.py icon="python" theme={null}
from mylib.algorithms import quick_sort
import pytest

@pytest.mark.parametrize("size", [10, 100, 1000])
def test_quick_sort_performance(benchmark, size):
    # NOT MEASURED: Create test data
    data = list(range(size, 0, -1))

    # MEASURED: The sorting algorithm
    result = benchmark(quick_sort, data)

    # NOT MEASURED: Verify correctness
    assert result == list(range(1, size + 1))
```

Make sure your package is installed in development mode:

```bash icon="square-terminal" theme={null}
uv pip install -e .
```

## Running Benchmarks Continuously with CodSpeed

So far, you've been running benchmarks locally. But local benchmarking has
limitations:

* **Inconsistent hardware**: Different developers get different results
* **Manual process**: Easy to forget to run benchmarks before merging
* **No historical tracking**: Hard to spot gradual performance degradation
* **No PR context**: Can't see performance impact during code review

This is where **CodSpeed** comes in. It runs your benchmarks automatically in CI
and provides:

* Automated performance regression detection in PRs
* Consistent metrics with reliable measurements across all runs
* Historical tracking to see performance over time with detailed charts
* Flamegraph profiles to see exactly what changed in your code's execution

<Tip>
  For the full CodSpeed integration reference, see [Writing Benchmarks in
  Python](/benchmarks/python).
</Tip>

### How to set up CodSpeed with pytest-codspeed

Here's how to integrate CodSpeed with your `pytest-codspeed` benchmarks:

<Steps>
  <Step title="Set Up GitHub Actions">
    Create a workflow file to run benchmarks on every push and pull request.

    <CIWorkflow
      mode="simulation"
      buildSteps={[
    "- name: Install uv",
    "  uses: astral-sh/setup-uv@v7",
    "",
    "- name: Set up Python",
    "  uses: actions/setup-python@v6",
    "  with:",
    "    python-version: '3.13'",
    "",
    "- name: Install dependencies",
    "  run: uv sync --all-extras --dev",
  ]}
      benchmarkCommand={["uv run pytest tests/ --codspeed"]}
    />

    <Warning>
      **Important**: Use `actions/setup-python` to set up Python, not `uv install`.
      This is required for CodSpeed's CPU simulation to work correctly.
    </Warning>
  </Step>

  <Step title="Check the Results">
    Once the workflow runs, your pull requests will receive a performance report
    comment:

    <img src="https://mintcdn.com/codspeed/jKaxX6yy-Kzw1C-0/assets/pr-comment-new-installation.png?fit=max&auto=format&n=jKaxX6yy-Kzw1C-0&q=85&s=4405db6390fe6f80b4f13d5baa2598d1" className="rounded-xl w-full max-w-lg mx-auto" alt="Pull Request Result" width="1744" height="820" data-path="assets/pr-comment-new-installation.png" />

    <img src="https://mintcdn.com/codspeed/jKaxX6yy-Kzw1C-0/assets/pr-status-check-success.png?fit=max&auto=format&n=jKaxX6yy-Kzw1C-0&q=85&s=a74b568e364c0b068623bd31ee869361" className="rounded-xl w-full max-w-md mx-auto" alt="Pull Request Result" width="1408" height="690" data-path="assets/pr-status-check-success.png" />
  </Step>

  <Step title="Access Detailed Reports and Flamegraphs">
    After your benchmarks run in CI, head over to your CodSpeed dashboard to see
    detailed performance reports, historical trends, and flamegraph profiles for
    deeper analysis.

    <Frame caption="Profiling Report on CodSpeed">
      <img src="https://mintcdn.com/codspeed/CInbng288QuXBkrC/features/assets/cover.png?fit=max&auto=format&n=CInbng288QuXBkrC&q=85&s=302d47fea90881b1af8ab6c21148c245" className="p-4 w-full max-w-lg mx-auto" alt="Profiling Report on CodSpeed" width="1171" height="685" data-path="features/assets/cover.png" />
    </Frame>

    <Tip>
      Profiling works out of the box, no extra configuration needed!

      [Learn more about flamegraphs and how to use them to optimize your code](/features/profiling).
    </Tip>
  </Step>
</Steps>

## Next Steps

Check out these resources to continue your Python benchmarking journey:

<CardGroup cols={2}>
  <Card title="Get Started with CodSpeed" href="https://codspeed.io?flow=get-started" icon="rocket">
    Sign up and start tracking your Python performance in CI
  </Card>

  <Card title="CodSpeed Python Benchmarking Docs" href="/benchmarks/python" icon={<PythonIcon />}>
    Explore the full pytest-codspeed API reference
  </Card>

  <Card title="Choosing the Right Strategy" href="/guides/choosing-the-correct-python-benchmarking-strategy" icon={<PythonIcon />}>
    Learn when to use different Python benchmarking approaches
  </Card>

  <Card title="Performance Profiling" href="/features/profiling" icon="fire">
    Learn how to use flamegraphs to optimize your code
  </Card>
</CardGroup>
