Skip to main content
As explained in the previous chapter on Benchmark Variance, there are many possible reasons why your code can be variable. There is no silver bullet, but different solutions can be employed to reduce unexpected variance.

Variance Categories

Variance can be separated into different groups, which will help understand and fix multiple regressions. The categories include:
  • Compiler/Linker variance: Whenever the built binary changes, this can cause code to be executed differently.
    • Cache variance: This describes variance caused by different cache behavior. In CI, each benchmark process typically runs once per commit, so cold-cache effects can influence results.
  • State-dependent variance: This describes all the variance that is caused by changing the underlying state of the system.
    • Allocator variance: Allocators can execute different code paths, depending on the current state of the allocator. Changing the memory fragmentation at a previous point in time, can cause variance in benchmarks that are executed later.
  • Environment variance: Variance caused by the runtime environment.
    • CPU variance: If code behaves differently based on the CPU, variance can be introduced. This happens in heavily optimized libraries/programs that might try to detect cache sizes, CPU features or the number of CPU cores.
    • Kernel variance: Syscalls can cause variance in benchmarks, as the kernel might execute different code paths depending on the current state of the system.

Strategies

One benchmark, one binary

Most of the issues come from multiple benchmarks being written and run in the same binary. Seemingly unrelated changes to the code, can cause ripple effects that are hard to track down. To fix this, we can compile each benchmark into its own binary. This will fix unrelated variance, as compilers (usually) produce the same binary when given the same input. The only downside to this approach is the increased linker/compilation overhead. For N benchmarks, we will have to compile N binaries. We only recommend this approach for micro-benchmarks which observe a significant amount of variance.

How to implement in Rust

In Rust, this can be done by adding a feature flag for each benchmark, which allows us to compile each benchmark into its own binary.
Cargo.toml
[features]
bench_foo = []
bench_bar = []
Then annotate each benchmark with the feature flag:
#[cfg(feature = "bench_foo")]
#[divan::bench]
fn bench_foo() {
    //
    //
}

#[cfg(feature = "bench_bar")]
#[divan::bench]
fn bench_bar() {
    //
    //
}
Then run like this:
$ cargo codspeed build -m simulation --features bench_foo
$ cargo codspeed run -m simulation

$ cargo codspeed build -m simulation --features bench_bar
$ cargo codspeed run -m simulation
For now, it’s only possible to build and execute a single benchmark at a time, but we’re exploring how to better integrate this into cargo-codspeed.
name: CodSpeed Benchmarks

on: [push, pull_request]

jobs:
  benchmarks:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        benchmark: [bench_foo, bench_bar]
    steps:
      - uses: actions/checkout@v5
      - uses: dtolnay/rust-toolchain@stable

      - name: Build benchmark
        run:
          cargo codspeed build -m simulation --features ${{ matrix.benchmark }}

      - name: Run benchmark
        uses: CodSpeedHQ/action@v4
        with:
          mode: simulation
          run: cargo codspeed run -m simulation

How to implement in C++

When using C++, we can achieve this by wrapping each BENCHMARK() in a define. This allows us to conditionally include/exclude benchmarks while building.
#ifdef BENCHMARK_BM_StringCopy
    static void BM_StringCopy(benchmark::State &state) {
    std::string x = "hello";
    for (auto _ : state) {
        std::string copy(x);
        benchmark::DoNotOptimize(copy);
        benchmark::ClobberMemory();
    }
    }
    BENCHMARK(BM_StringCopy);
#endif
Then build each benchmark into its own binary:
for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
  cmake -S . -B "build/$define" -DCODSPEED_MODE=simulation -D"$define"=ON
  cmake --build "build/$define"
  cp "build/$define/<your_binary>" "codspeed-results/$define"
done
Then run each benchmark:
for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
  ./codspeed-results/$define
done
In GitHub Actions we then do the same:
name: CodSpeed Benchmarks

on: [push, pull_request]

jobs:
  benchmarks:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v5

      - name: Build all benchmarks
        run: |
          for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
            cmake -S . -B "build/$define" -DCODSPEED_MODE=simulation -D"$define"=ON
            cmake --build "build/$define"
            cp "build/$define/<your_binary>" "codspeed-results/$define"
          done

      - name: Run benchmarks
        uses: CodSpeedHQ/action@v4
        with:
          mode: simulation
          run: |
            for define in $(rg -oN 'BENCHMARK_\w+' src | sort -u); do
              ./codspeed-results/$define
            done

We’re actively exploring how to implement this in our integrations. If you have further questions, please reach out to us via Discord or email.