> ## Documentation Index
> Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Reducing Variance

> Learn how to reduce variance in your benchmarks.

As explained in the previous chapter on
[Benchmark Variance](/instruments/cpu/regression-causes), there are many
possible reasons why your code can be variable. There is no silver bullet, but
different solutions can be employed to reduce unexpected variance.

## Variance Categories

Variance can be separated into different groups, which will help understand and
fix multiple regressions. The categories include:

* **Compiler/Linker variance**: Whenever the built binary changes, this can
  cause code to be executed differently.
  * **Cache variance**: This describes variance caused by different cache
    behavior. In CI, each benchmark process typically runs once per commit, so
    cold-cache effects can influence results.
* **State-dependent variance**: This describes all the variance that is caused
  by changing the underlying state of the system.
  * **Allocator variance**: Allocators can execute different code paths,
    depending on the current state of the allocator. Changing the memory
    fragmentation at a previous point in time, can cause variance in benchmarks
    that are executed later.
* **Environment variance**: Variance caused by the runtime environment.
  * **CPU variance**: If code behaves differently based on the CPU, variance can
    be introduced. This happens in heavily optimized libraries/programs that
    might try to detect cache sizes, CPU features or the number of CPU cores.
  * **Kernel variance**: Syscalls can cause variance in benchmarks, as the
    kernel might execute different code paths depending on the current state of
    the system.

```mermaid theme={null}
flowchart TD
    top["Unrelated code change"] -->|"different binary layout"| compiler["Compiler/Linker variance"]
    top -->|"state changes between runs"| order["State-dependent<br/>Variance"]
    top --> env["Environment variance"]

    compiler --> cold["Cache variance"]
    order --> alloc["Allocator variance"]
    env --> cpuv["CPU variance"]
    env --> kernelv["Kernel variance"]
```

## Strategies

### One benchmark, one binary

Most of the issues come from multiple benchmarks being written and run in the
same binary. Seemingly unrelated changes to the code, can cause ripple effects
that are hard to track down.

To fix this, we can compile each benchmark into its own binary. This will fix
unrelated variance, as compilers (usually) produce the same binary when given
the same input.

The only downside to this approach is the increased linker/compilation overhead.
For N benchmarks, we will have to compile N binaries. We only recommend this
approach for micro-benchmarks which observe a significant amount of variance.

#### How to implement in Rust

In Rust, this can be done by adding a feature flag for each benchmark, which
allows us to compile each benchmark into its own binary.

```toml Cargo.toml theme={null}
[features]
bench_foo = []
bench_bar = []
```

Then annotate each benchmark with the feature flag:

```rust highlight={1,8} theme={null}
#[cfg(feature = "bench_foo")]
#[divan::bench]
fn bench_foo() {
    //
    //
}

#[cfg(feature = "bench_bar")]
#[divan::bench]
fn bench_bar() {
    //
    //
}
```

Then run like this:

```bash theme={null}
$ cargo codspeed build -m simulation --features bench_foo
$ cargo codspeed run -m simulation

$ cargo codspeed build -m simulation --features bench_bar
$ cargo codspeed run -m simulation
```

For now, it's only possible to build and execute a single benchmark at a time,
but we're exploring how to better integrate this into cargo-codspeed.

```yaml highlight={17,23} theme={null}
name: CodSpeed Benchmarks

on: [push, pull_request]

jobs:
  benchmarks:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        benchmark: [bench_foo, bench_bar]
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable

      - name: Build benchmark
        run:
          cargo codspeed build -m simulation --features ${{ matrix.benchmark }}

      - name: Run benchmark
        uses: CodSpeedHQ/action@v4
        with:
          mode: simulation
          run: cargo codspeed run -m simulation
```

#### How to implement in C++

When using C++, we can achieve this by wrapping each `BENCHMARK()` in a define.
This allows us to conditionally include/exclude benchmarks while building.

```cpp highlight={1,11} theme={null}
#ifdef BENCHMARK_BM_StringCopy
    static void BM_StringCopy(benchmark::State &state) {
    std::string x = "hello";
    for (auto _ : state) {
        std::string copy(x);
        benchmark::DoNotOptimize(copy);
        benchmark::ClobberMemory();
    }
    }
    BENCHMARK(BM_StringCopy);
#endif
```

Then build each benchmark into its own binary:

```bash theme={null}
for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
  cmake -S . -B "build/$define" -DCODSPEED_MODE=simulation -D"$define"=ON
  cmake --build "build/$define"
  cp "build/$define/<your_binary>" "codspeed-results/$define"
done
```

Then run each benchmark:

```bash theme={null}
for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
  ./codspeed-results/$define
done
```

In GitHub Actions we then do the same:

```yaml highlight={13-17, 24-26} theme={null}
name: CodSpeed Benchmarks

on: [push, pull_request]

jobs:
  benchmarks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5

      - name: Build all benchmarks
        run: |
          for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
            cmake -S . -B "build/$define" -DCODSPEED_MODE=simulation -D"$define"=ON
            cmake --build "build/$define"
            cp "build/$define/<your_binary>" "codspeed-results/$define"
          done

      - name: Run benchmarks
        uses: CodSpeedHQ/action@v4
        with:
          mode: simulation
          run: |
            for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
              ./codspeed-results/$define
            done
```

### Reducing allocator variance

#### Tune your allocator

Most allocators expose configuration options that affect determinism. Reducing
the number of arenas, disabling caches, and controlling page purging behavior
can all help stabilize benchmark results. Refer to the allocator documentation
for available options:

* [glibc `mallopt`](https://man7.org/linux/man-pages/man3/mallopt.3.html)
* [jemalloc `MALLOC_CONF`](https://github.com/jemalloc/jemalloc/blob/dev/TUNING.md)
  * `dirty_decay_ms:-1,muzzy_decay_ms:-1`: This disables returning unused pages
    back to the OS, which can otherwise randomly slowdown benchmarks.
* [tcmalloc](https://google.github.io/tcmalloc/tuning.html)
  * [`SetProfileSamplingInterval(MAX)`](https://github.com/google/tcmalloc/blob/master/docs/sampling.md):
    Disables heap profile sampling
  * [`SetGuardedSamplingInterval(-1)`](https://github.com/google/tcmalloc/blob/b90d4ac374850b0bec6bbf9b520e8afcb6496517/tcmalloc/malloc_extension.h#L527C1-L534C60):
    Disables [GWP-ASan](https://google.github.io/tcmalloc/gwp-asan.html) guarded
    sampling, which otherwise probabilistically guards allocations to detect
    buffer overflows and use-after-free.
  * [`SetBackgroundProcessActionsEnabled(false)`](https://github.com/google/tcmalloc/blob/b90d4ac374850b0bec6bbf9b520e8afcb6496517/tcmalloc/malloc_extension.h#L555):
    Disables background memory release actions that can cause timing variance.

#### Use a custom allocator

In many cases, variance is caused by `realloc` which either grows the allocation
in place, or creates a new allocation and moves the previous allocation to the
new one.

Whether in-place growing succeeds depends on the OS memory state, making it
**completely non-deterministic**. To fix this, we can always run the slow-path
that never grows in-place.

Here is an example in Rust (adapted from
[oxc](https://github.com/oxc-project/oxc/blob/112580a408843f9f405d8d8acc9d03990a75eaff/tasks/benchmark/src/lib.rs)),
which uses the default implementation of
[`GlobalAlloc::realloc`](https://github.com/rust-lang/rust/blob/d933cf483edf1605142ac6899ff32536c0ad8b22/library/core/src/alloc/global.rs#L286-L303)
that always allocates and then copies the memory.

```rust theme={null}
use std::alloc::{GlobalAlloc, Layout, System};

#[global_allocator]
static GLOBAL: NeverGrowInPlaceAllocator = NeverGrowInPlaceAllocator;

struct NeverGrowInPlaceAllocator;

unsafe impl GlobalAlloc for NeverGrowInPlaceAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        System.alloc(layout)
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        System.dealloc(ptr, layout);
    }
}
```

***

We're actively exploring how to implement this in our integrations. If you have
further questions, please reach out to us via
[Discord](https://discord.gg/MxpaCfKSqF) or [email](mailto:contact@codspeed.io).
