> ## Documentation Index
> Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# How to Benchmark C++ with Google Benchmark?

> Learn how to measure the performance of your C++ code by writing and running benchmarks locally and continuously in CI to catch regressions.

export const CIWorkflow = ({minimal = false, enableWorkflowDispatch = true, runsOn = "ubuntu-latest", highlight = [], mode, modes, submodules = false, preSteps = [], buildSteps = ["# ...", "# Setup your environment here:", "#  - Configure your Python/Rust/Node version", "#  - Install your dependencies", "#  - Build your benchmarks (if using a compiled language)", "# ..."], benchmarkCommand = ["<Insert your benchmark command here>"], jobName = "Run benchmarks", env = {}}) => {
  const modeList = modes || (mode ? [mode] : undefined);
  if (!modeList || modeList.length === 0) {
    throw new Error("mode or modes is required");
  }
  const indent = (lines, depth) => {
    const reindentedLines = lines.map(l => l.length === 0 ? l : (" ").repeat(depth) + l);
    return reindentedLines.join("\n");
  };
  const workflowDispatchSection = enableWorkflowDispatch ? "  # `workflow_dispatch` allows CodSpeed to trigger backtest\n" + "  # performance analysis in order to generate initial data.\n" + "  workflow_dispatch:\n" : "";
  let yaml = "";
  if (!minimal) {
    yaml += `
name: CodSpeed Benchmarks

on:
  push:
    branches:
      - "main" # or "master"
  pull_request:
`;
    yaml += workflowDispatchSection;
  }
  yaml += `
jobs:
  benchmarks:
    name: ${jobName}
    runs-on: ${runsOn}`;
  if (!minimal) {
    yaml += `
    permissions: # optional for public repositories
      contents: read # required for actions/checkout
      id-token: write # required for OIDC authentication with CodSpeed`;
  }
  if (preSteps.length > 0) yaml += "\n" + indent(preSteps, 4);
  yaml += `
    steps:
      - uses: actions/checkout@v5`;
  if (submodules) {
    const value = typeof submodules === "string" ? submodules : "true";
    yaml += `\n        with:\n          submodules: ${value}`;
  }
  yaml += "\n" + indent(buildSteps, 6);
  const modeValue = modeList.join(",");
  yaml += `
      - name: Run the benchmarks
        uses: CodSpeedHQ/action@v4
        with:
          mode: ${modeValue}`;
  if (benchmarkCommand.length > 0) {
    const indentedBenchCommand = benchmarkCommand.length > 1 ? benchmarkCommand[0] + "\n" + indent(benchmarkCommand.slice(1), 12) : benchmarkCommand;
    const runLine = indent(["run: "], 10) + indentedBenchCommand;
    yaml += `\n${runLine}`;
  }
  const envEntries = Object.entries(env);
  if (envEntries.length > 0) {
    const envLines = ["env:", ...envEntries.map(([k, v]) => `  ${k}: ${v}`)];
    yaml += "\n" + indent(envLines, 8);
  }
  return <CodeBlock language="yaml" highlight={JSON.stringify(highlight)} {...minimal || ({
    filename: ".github/workflows/codspeed.yml",
    icon: "github"
  })}>
      {yaml}
    </CodeBlock>;
};

export const CppIcon = props => <svg xmlns="http://www.w3.org/2000/svg" className="h-6 w-6" viewBox="0 0 31 31" width={31} height={31} fill="none" {...props}>
    <g clipPath="url(#a)">
      <path fill="#00599C" d="M27.994 22.981c.206-.356.333-.757.333-1.117V9.36c0-.36-.127-.76-.333-1.117l-12.764 7.37 12.764 7.368Z" />
      <path fill="#004482" d="m16.363 30.08 10.828-6.252c.312-.18.596-.49.801-.847l-12.764-7.369-12.763 7.37c.205.355.489.666.8.846l10.83 6.252c.623.36 1.644.36 2.268 0Z" />
      <path fill="#659AD2" d="M27.993 8.243c-.205-.356-.489-.667-.8-.847l-10.83-6.252c-.623-.36-1.644-.36-2.268 0L3.267 7.396c-.624.36-1.134 1.244-1.134 1.964v12.504c0 .36.127.761.333 1.117l12.764-7.369 12.763-7.369Z" />
      <path fill="#fff" d="M15.227 24.343c-4.814 0-8.73-3.916-8.73-8.73 0-4.815 3.916-8.732 8.73-8.732a8.762 8.762 0 0 1 7.561 4.363L19.01 13.43a4.384 4.384 0 0 0-3.783-2.184 4.37 4.37 0 0 0-4.365 4.366 4.37 4.37 0 0 0 4.365 4.366 4.384 4.384 0 0 0 3.783-2.184l3.779 2.186a8.763 8.763 0 0 1-7.562 4.363Z" />
      <path fill="#fff" d="M23.961 15.127h-.97v-.97h-.97v.97h-.97v.97h.97v.97h.97v-.97h.97v-.97ZM27.598 15.127h-.97v-.97h-.97v.97h-.97v.97h.97v.97h.97v-.97h.97v-.97Z" />
    </g>
    <defs>
      <clipPath id="a">
        <rect width={30.271} height={30.271} x={0.219} y={0.207} fill="#fff" rx={3.784} />
      </clipPath>
    </defs>
  </svg>;

## Choosing our Benchmarking Strategy

We are going to use [`google_benchmark`](https://github.com/google/benchmark),
the standard C++ benchmarking library maintained by Google. It's widely adopted
across the C++ ecosystem, supports fixtures and parameterized benchmarks with
statistical analysis, and works with CMake, Bazel, and other build systems.

<Info>
  This guide uses [CMake](https://cmake.org/) as the build system. If you're
  using [Bazel](https://bazel.build/), check out the [Bazel integration
  documentation](/benchmarks/cpp#bazel) for build instructions.
</Info>

## Your First Benchmark

Let's start by creating a benchmark for a recursive Fibonacci function to see
how we can measure computational performance.

### Project Setup

First, create a basic project structure:

```bash icon="square-terminal" theme={null}
mkdir my_project && cd my_project
mkdir benchmarks
```

### Writing the Benchmark

Create a new file `benchmarks/main.cpp`:

```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
#include <benchmark/benchmark.h>

// Recursive Fibonacci function to benchmark
static long long fibonacci(int n) {
  if (n <= 1)
    return n;
  return fibonacci(n - 1) + fibonacci(n - 2);
}

// Define the benchmark
static void BM_Fibonacci(benchmark::State &state) {
  // Use a volatile variable to prevent compile-time optimization
  volatile int n = 30;

  // This loop runs multiple times to get accurate measurements
  for (auto _ : state) {
    // Prevent compiler from optimizing away the computation
    auto result = fibonacci(n);
    benchmark::DoNotOptimize(result);
  }
}

// Register the benchmark, specifying the time unit as milliseconds for better
// readability
BENCHMARK(BM_Fibonacci)->Unit(benchmark::kMillisecond);

// Entrypoint that runs all registered benchmarks
BENCHMARK_MAIN();
```

A few things to note:

* `volatile int n = 30` prevents the compiler from computing the result at
  compile time
* `benchmark::State& state` provides the benchmark loop that runs your code
  multiple times
* `for (auto _ : state)` is where your actual benchmark code goes - this loop is
  timed
* `benchmark::DoNotOptimize()` prevents the compiler from optimizing away the
  result
* `BENCHMARK()` registers your function as a benchmark
  * `->Unit(benchmark::kMillisecond)` displays results in milliseconds for
    better readability as by default it's in nanoseconds
* `BENCHMARK_MAIN()` provides the entry point that discovers and runs all
  benchmarks

<Tip>
  To learn more about preventing compiler optimizations, check out the
  [Prevent Compiler Optimizations](#prevent-compiler-optimizations) section below.
</Tip>

### Configuration with CMake

Create a `CMakeLists.txt` file in the `benchmarks/` folder:

```cmake benchmarks/CMakeLists.txt theme={null}
cmake_minimum_required(VERSION 3.14)
project(my_benchmarks VERSION 0.1.0 LANGUAGES CXX)

# Use C++17 (or your preferred version)
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Enable optimizations with debug symbols for profiling
set(CMAKE_BUILD_TYPE RelWithDebInfo)

# Fetch google_benchmark from CodSpeed's repository
include(FetchContent)
FetchContent_Declare(
  google_benchmark
  GIT_REPOSITORY https://github.com/CodSpeedHQ/codspeed-cpp
  SOURCE_SUBDIR google_benchmark
  GIT_TAG main
)

set(BENCHMARK_DOWNLOAD_DEPENDENCIES ON)
FetchContent_MakeAvailable(google_benchmark)

# Create the benchmark executable
add_executable(bench main.cpp)

# Link against google_benchmark
target_link_libraries(bench benchmark::benchmark)
```

Key configuration points:

* `CMAKE_BUILD_TYPE RelWithDebInfo` enables optimizations with debug symbols for
  accurate profiling
* We use CodSpeed's fork of `google_benchmark` which adds performance
  measurement capabilities and CI integration
* `BENCHMARK_DOWNLOAD_DEPENDENCIES ON` allows google\_benchmark to download its
  dependencies

### Building and Running the Benchmark

Build your benchmark:

```bash icon="square-terminal" theme={null}
cd benchmarks
mkdir build && cd build
cmake ..
make
```

You should see output like:

```shellsession title=terminal icon="square-terminal" theme={null}
-- The CXX compiler identification is GNU 14.2.1
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Configuring done (8.6s)
-- Generating done (0.1s)
-- Build files have been written to: /home/user/my_project/benchmarks/build
[  1%] Building CXX object ...
...
[100%] Built target bench
```

Now run your benchmark:

```bash icon="square-terminal" theme={null}
./bench
```

You should see output like this:

```shellsession title=terminal icon="square-terminal" theme={null}
2025-12-01T17:24:27+01:00
Running ./bench
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 8.47, 7.96, 7.04
-------------------------------------------------------
Benchmark             Time             CPU   Iterations
-------------------------------------------------------
BM_Fibonacci       2.74 ms         2.65 ms          271
```

Congratulations! You've created your first C++ benchmark. The output shows that
computing `fibonacci(30)` takes about 2.74 milliseconds on average.

<Tip>
  **Understanding the results:**

  * **Time**: Wall-clock time per iteration (lower is better)
  * **CPU**: CPU time per iteration (accounts for multi-threading)
  * **Iterations**: How many times the benchmark ran to get reliable measurements
</Tip>

## Benchmarking with Parameters

So far, we've only tested our function with a single input (n=30). But what if
we want to see how performance changes with different input sizes? This is where
`DenseRange` comes in.

Let's add a parameterized benchmark to test Fibonacci with various input sizes.
Update your `main.cpp` to include:

```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
// Define the benchmark with a parameter
static void BM_Fibonacci_DenseRange(benchmark::State &state) {
  // Get the input value from the benchmark parameter
  volatile int n = state.range(0);

  for (auto _ : state) {
    auto result = fibonacci(n);
    benchmark::DoNotOptimize(result);
  }
}

// Test Fibonacci with inputs from 15 to 35 in steps of 5
BENCHMARK(BM_Fibonacci_DenseRange)
    ->DenseRange(15, 35, 5) // Test inputs 15, 20, 25, 30, 35
    ->Unit(benchmark::kMillisecond);
```

Now `state.range(0)` gives us the input parameter, and `DenseRange(15, 35, 5)`
tells the benchmark to run with inputs 15, 20, 25, 30, and 35.

Rebuild and run:

```bash icon="square-terminal" theme={null}
make
./bench --benchmark_filter=Fibonacci_DenseRange
```

<Tip>
  We used the `--benchmark_filter` flag to only run benchmarks matching
  `Fibonacci_DenseRange`. This is useful when you have many benchmarks and want to
  focus on a subset.

  Learn more about
  [benchmark a subset of benchmarks](https://google.github.io/benchmark/user_guide.html#running-a-subset-of-benchmarks).
</Tip>

You should see output like:

```shellsession title=terminal icon="square-terminal" theme={null}
---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
BM_Fibonacci_DenseRange/15      0.002 ms        0.002 ms       380948
BM_Fibonacci_DenseRange/20      0.022 ms        0.021 ms        33413
BM_Fibonacci_DenseRange/25      0.276 ms        0.234 ms         3050
BM_Fibonacci_DenseRange/30       2.62 ms         2.59 ms          278
BM_Fibonacci_DenseRange/35       28.1 ms         28.0 ms           25
```

Notice how the execution time grows exponentially with the input size, clearly
demonstrating the O(2^n) complexity of the recursive Fibonacci algorithm. This
is the power of parameterized benchmarks – they help you understand how your
code scales with different inputs.

#### Multiple Arguments

What if your function takes multiple parameters? For example, let's benchmark
the performance of `std::string::find()` with varying text and pattern sizes.

Let's add a new benchmark to `main.cpp`:

```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
// ... (previous code) ...
#include <string>

static void BM_StringFind(benchmark::State& state) {
  size_t string_size = state.range(0);
  size_t pattern_size = state.range(1);

  // Setup
  std::string text(string_size, 'a');
  std::string pattern(pattern_size, 'b');
  // Place pattern near the end for worst-case scenario
  text.replace(string_size - pattern_size, pattern_size, pattern);

  // Benchmark
  for (auto _ : state) {
    auto pos = text.find(pattern);
    benchmark::DoNotOptimize(pos);
  }
}

// Benchmark different combinations of text and pattern sizes using ArgsProduct
BENCHMARK(BM_StringFind)
    ->ArgsProduct({
        {1000, 10000, 100000}, // Text sizes
        {50, 500}              // Pattern sizes
    });
```

The `ArgsProduct()` function creates benchmarks for all combinations of the
provided argument lists. In this case, it generates 6 benchmarks (3 text sizes ×
2 pattern sizes), letting you analyze how both parameters affect performance.

Here is the output when you run this benchmark:

```shellsession title=terminal icon="square-terminal" theme={null}
./bench --benchmark_filter=StringFind
...
-------------------------------------------------------------------
Benchmark                         Time             CPU   Iterations
-------------------------------------------------------------------
BM_StringFind/1000/50          28.7 ns         28.0 ns     25077651
BM_StringFind/10000/50          337 ns          237 ns      3123341
BM_StringFind/100000/50        2157 ns         2066 ns       287731
BM_StringFind/1000/500         30.3 ns         28.6 ns     24820407
BM_StringFind/10000/500         248 ns          243 ns      2987100
BM_StringFind/100000/500       2075 ns         2031 ns       348384
```

<Tip>
  There are more ways to define parameterized benchmarks, check out the
  [`google_benchmark` documentation on parameterized benchmarks](https://google.github.io/benchmark/user_guide.html#passing-arguments).
</Tip>

## Benchmarking Only What Matters

Sometimes you have expensive setup that shouldn't be included in your benchmark
measurements. For example, loading data from a file or creating large data
structures. Google Benchmark provides several ways to handle this.

### Fresh Setup per Iteration

Let's benchmark a sorting algorithm where we need fresh data for each iteration.
We do not want the data generation time to be included in the benchmark. We can
exclude it using `PauseTiming()` and `ResumeTiming()`:

```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
// ... (previous code) ...
#include <algorithm>
#include <random>
#include <vector>

static void BM_SortVector(benchmark::State &state) {
  size_t size = state.range(0);
  std::mt19937 gen(42); // Fixed seed for reproducibility

  for (auto _ : state) {
    // Pause timing during setup
    state.PauseTiming();

    // Generate random data (NOT measured)
    std::vector<int> data(size);
    std::uniform_int_distribution<> dis(1, 10000);
    for (size_t i = 0; i < size; ++i) {
      data[i] = dis(gen);
    }

    // Resume timing for the actual work
    state.ResumeTiming();

    // Sort the vector (MEASURED)
    std::sort(data.begin(), data.end());
    benchmark::DoNotOptimize(data.data());
    benchmark::ClobberMemory();
  }
}

BENCHMARK(BM_SortVector)->Range(100, 100000)->Unit(benchmark::kMicrosecond);
```

The setup code (generating random data) runs before each iteration but isn't
included in the timing. Only the `std::sort()` call is measured.

<Warning>
  **Use PauseTiming/ResumeTiming sparingly**

  While `PauseTiming()` and `ResumeTiming()` are useful, they add overhead to your
  benchmarks. If your setup can be done once before all iterations (like loading a
  file), use fixtures instead (see next section) for better performance and
  cleaner code.
</Warning>

### Shared Setup for All Iterations

When you can reuse the same data across iterations, fixtures are more efficient.
They are a class that defines a setup and teardown process that runs once for
all iterations. Both of these methods are not included in the timing.

Here is an example where we set up a sorted vector once for all iterations and
benchmark binary search on it:

```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
// Define a fixture class that sets up a random vector for searching
class VectorFixture : public benchmark::Fixture {
public:
  std::vector<int> data;

  // Setup runs once before all iterations
  void SetUp(const ::benchmark::State &state) {
    size_t size = state.range(0);
    std::mt19937 gen(42); // Fixed seed for reproducibility
    std::uniform_int_distribution<> dis(1, size);
    data.resize(size);
    for (size_t i = 0; i < size; ++i) {
      data[i] = dis(gen);
    }
    std::sort(data.begin(), data.end());
  }

  // TearDown runs once after all iterations
  void TearDown(const ::benchmark::State &) { data.clear(); }
};

// Define the BinarySearch benchmark using VectorFixture
BENCHMARK_DEFINE_F(VectorFixture, BinarySearch)(benchmark::State &state) {
  int target = data.size() / 2;

  for (auto _ : state) {
    // Only this is measured
    bool found = std::binary_search(data.begin(), data.end(), target);
    benchmark::DoNotOptimize(found);
  }
}

// Register the fixture benchmark with different vector sizes
BENCHMARK_REGISTER_F(VectorFixture, BinarySearch)->Range(1000, 100000);
```

In this example, the `SetUp()` method initializes a sorted vector once before
all iterations, and `TearDown()` cleans up afterward. The benchmark only
measures the `std::binary_search()` calls. Fixtures use different macros:
`BENCHMARK_DEFINE_F` to define and `BENCHMARK_REGISTER_F` to register with
parameters.

## Best Practices

### Prevent Compiler Optimizations

The C++ compiler is extremely aggressive with optimizations. Always protect your
benchmarks:

```cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
// ❌ BAD: Compiler might optimize everything away
static void BM_Bad(benchmark::State& state) {
  for (auto _ : state) {
    int x = 42;
    int y = x * 2; // Compiler knows this is 84 at compile time
  }
}

// ✅ GOOD: Use DoNotOptimize for values
static void BM_Good(benchmark::State& state) {
  for (auto _ : state) {
    int x = 42;
    benchmark::DoNotOptimize(x);
    int y = x * 2;
    benchmark::DoNotOptimize(y);
  }
}

// ✅ BETTER: Use DoNotOptimize and ClobberMemory
static void BM_Better(benchmark::State& state) {
  for (auto _ : state) {
    int x = 42;
    benchmark::DoNotOptimize(x);
    int y = x * 2;
    benchmark::DoNotOptimize(y);
    benchmark::ClobberMemory();
  }
}
```

<Tip>
  **Important**: Always use `benchmark::DoNotOptimize()` to prevent the compiler
  from optimizing away your benchmarks. Without it, the compiler might eliminate
  the code you're trying to measure, giving you inaccurate results.

  **Understanding DoNotOptimize vs ClobberMemory:**

  * `DoNotOptimize(value)` forces the result of a computation to be stored in
    memory or a register, preventing the compiler from eliminating the computation
    entirely
  * `ClobberMemory()` forces the compiler to flush all pending writes to memory,
    preventing operations with memory side effects from being optimized away
  * Use `DoNotOptimize()` for return values and computed results
  * Add `ClobberMemory()` when benchmarking operations that modify memory (like
    filling vectors or copying data)

  Learn more in the
  [Google Benchmark guide on preventing optimization](https://google.github.io/benchmark/user_guide.html#preventing-optimization).
</Tip>

### Keep Benchmarks Deterministic

Use fixed seeds for random number generators:

```cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
// ❌ BAD: Non-deterministic results
static void BM_NonDeterministic(benchmark::State& state) {
  std::random_device rd;
  std::mt19937 gen(rd()); // Different every run!

  for (auto _ : state) {
    // ...
  }
}

// ✅ GOOD: Deterministic with fixed seed
static void BM_Deterministic(benchmark::State& state) {
  std::mt19937 gen(42); // Fixed seed

  for (auto _ : state) {
    // ...
  }
}
```

### Benchmark Real-World Code

In real projects, you'll benchmark functions from your library. Here's a typical
structure for a C++ project with benchmarks:

```shellsession title=terminal icon="square-terminal" theme={null}
my_project/
├── CMakeLists.txt
├── include/
│   └── mylib/
│       └── algorithms.hpp
├── src/
│   └── algorithms.cpp
└── benchmarks/
    └── bench_algorithms.cpp
```

The header `include/mylib/algorithms.hpp` defines your library's API:

```cpp include/mylib/algorithms.hpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
#pragma once
#include <vector>

namespace mylib {

std::vector<int> bubble_sort(std::vector<int> arr);

} // namespace mylib
```

The implementation `src/algorithms.cpp` contains the actual algorithm:

```cpp src/algorithms.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
#include "mylib/algorithms.hpp"

namespace mylib {

std::vector<int> bubble_sort(std::vector<int> arr) {
  size_t n = arr.size();
  for (size_t i = 0; i < n; ++i) {
    for (size_t j = 0; j < n - 1 - i; ++j) {
      if (arr[j] > arr[j + 1]) {
        std::swap(arr[j], arr[j + 1]);
      }
    }
  }
  return arr;
}

} // namespace mylib
```

The benchmark `benchmarks/bench_algorithms.cpp` tests the bubble sort function:

```cpp benchmarks/bench_algorithms.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null}
#include "mylib/algorithms.hpp"
#include <benchmark/benchmark.h>
#include <random>

// Define a fixture class that sets up random data for sorting
class SortFixture : public benchmark::Fixture {
public:
  std::vector<int> original_data;

  // Setup runs once before all iterations
  void SetUp(const ::benchmark::State &state) {
    size_t size = state.range(0);
    std::mt19937 gen(42); // Fixed seed for reproducibility
    std::uniform_int_distribution<> dis(1, size);
    original_data.resize(size);
    for (size_t i = 0; i < size; ++i) {
      original_data[i] = dis(gen);
    }
  }

  // TearDown runs once after all iterations
  void TearDown(const ::benchmark::State &) { original_data.clear(); }
};

// Define the BubbleSort benchmark using SortFixture
BENCHMARK_DEFINE_F(SortFixture, BubbleSort)(benchmark::State &state) {
  for (auto _ : state) {
    // Make a copy of the original data for each iteration
    // Only the sorting is measured, not the copy
    state.PauseTiming();
    std::vector<int> data = original_data;
    state.ResumeTiming();

    auto sorted = mylib::bubble_sort(data);
    benchmark::DoNotOptimize(sorted.data());
    benchmark::ClobberMemory();
  }
}

// Register the fixture benchmark with different data sizes
BENCHMARK_REGISTER_F(SortFixture, BubbleSort)
    ->Range(1000, 100000)
    ->Unit(benchmark::kMillisecond);

BENCHMARK_MAIN();
```

Update your `CMakeLists.txt` to build both your library and benchmarks:

```cmake CMakeLists.txt theme={null}
cmake_minimum_required(VERSION 3.14)
project(mylib VERSION 0.1.0 LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

# Enable optimizations with debug symbols for profiling
set(CMAKE_BUILD_TYPE RelWithDebInfo)

# Your library
add_library(mylib src/algorithms.cpp)
target_include_directories(mylib PUBLIC include)

# Fetch google_benchmark
include(FetchContent)
FetchContent_Declare(
  google_benchmark
  GIT_REPOSITORY https://github.com/CodSpeedHQ/codspeed-cpp
  SOURCE_SUBDIR google_benchmark
  GIT_TAG main
)

set(BENCHMARK_DOWNLOAD_DEPENDENCIES ON)
FetchContent_MakeAvailable(google_benchmark)

# Benchmark executable
add_executable(bench_algorithms benchmarks/bench_algorithms.cpp)
target_link_libraries(bench_algorithms mylib benchmark::benchmark)
```

You can now build and run your benchmarks with the following commands:

```bash icon="square-terminal" theme={null}
mkdir build && cd build
cmake ..
make
./bench_algorithms
```

This will yield an output similar to:

```shellsession title=terminal icon="square-terminal" theme={null}
2025-12-02T16:50:44+01:00
Running ./bench_algorithms
Run on (8 X 24 MHz CPU s)
CPU Caches:
  L1 Data 64 KiB
  L1 Instruction 128 KiB
  L2 Unified 4096 KiB (x8)
Load Average: 9.83, 10.83, 8.99
------------------------------------------------------------------------
Benchmark                              Time             CPU   Iterations
------------------------------------------------------------------------
SortFixture/BubbleSort/1000        0.381 ms        0.321 ms         2219
SortFixture/BubbleSort/4096         5.80 ms         4.97 ms          136
SortFixture/BubbleSort/32768         732 ms          718 ms            1
SortFixture/BubbleSort/100000      10848 ms         9529 ms            1
```

## Running Benchmarks Continuously with CodSpeed

So far, you've been running benchmarks locally. But local benchmarking has
limitations:

* **Inconsistent hardware**: Different developers get different results
* **Manual process**: Easy to forget to run benchmarks before merging
* **No historical tracking**: Hard to spot gradual performance degradation
* **No PR context**: Can't see performance impact during code review

This is where **CodSpeed** comes in. It runs your benchmarks automatically in CI
and provides:

* Automated performance regression detection in PRs
* Consistent metrics with reliable measurements across all runs
* Historical tracking to see performance over time with detailed charts
* Flamegraph profiles to see exactly what changed in your code's execution

<Tip>
  For the full CodSpeed integration reference, see [Writing Benchmarks in
  C++](/benchmarks/cpp).
</Tip>

### How to set up CodSpeed with google\_benchmark

Here's how to integrate CodSpeed with your `google_benchmark` benchmarks using
CMake:

<Steps>
  <Step title="Build and run the benchmarks locally with CodSpeed enabled">
    CodSpeed provides a special build mode that instruments your benchmarks for performance tracking.

    This is controlled with the `CODSPEED_MODE` CMake flag, which can be set to:

    * `off`: (default) Regular benchmarking without CodSpeed
    * `simulation`: CodSpeed CPU simulation mode for CI
    * `walltime`: Walltime measurements (see [walltime docs](/instruments/walltime))

    Build your benchmarks with CodSpeed mode enabled:

    ```bash icon="square-terminal" theme={null}
    cd benchmarks
    mkdir build && cd build
    cmake -DCODSPEED_MODE=simulation ..
    make
    ```

    Run the benchmarks to verify everything works:

    ```bash icon="square-terminal" theme={null}
    ./bench_algorithms
    ```

    You should see output indicating CodSpeed is enabled:

    ```shellsession title=terminal icon="square-terminal" theme={null}
    Codspeed mode: simulation
    2025-12-02T17:21:57+01:00
    Running ./bench_algorithms
    Run on (8 X 24 MHz CPU s)
    CPU Caches:
      L1 Data 64 KiB
      L1 Instruction 128 KiB
      L2 Unified 4096 KiB (x8)
    Load Average: 9.22, 7.26, 6.71
    NOTICE: codspeed is enabled, but no performance measurement will be made since it's running in an unknown environment.
    Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][1000]
    Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][4096]
    Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][32768]
    Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][100000]
    ```

    <Info>
      Notice there are no timing measurements in the local output. CodSpeed only
      captures actual performance data when running in CI.
    </Info>
  </Step>

  <Step title="Set Up GitHub Actions">
    Create a workflow file to run benchmarks on every push and pull request:

    <CIWorkflow
      mode="simulation"
      buildSteps={[
    "- name: Build the benchmark target(s)",
    "  run: |",
    "    cd benchmarks",
    "    mkdir build",
    "    cd build",
    "    cmake -DCODSPEED_MODE=simulation ..",
    "    make -j",
  ]}
      benchmarkCommand={["./benchmarks/build/bench_algorithms"]}
    />
  </Step>

  <Step title="Check the Results">
    Once the workflow runs, your pull requests will receive a performance report
    comment:

    <img src="https://mintcdn.com/codspeed/jKaxX6yy-Kzw1C-0/assets/pr-comment-new-installation.png?fit=max&auto=format&n=jKaxX6yy-Kzw1C-0&q=85&s=4405db6390fe6f80b4f13d5baa2598d1" className="rounded-xl w-full max-w-lg mx-auto" alt="Pull Request Result" width="1744" height="820" data-path="assets/pr-comment-new-installation.png" />

    <img src="https://mintcdn.com/codspeed/jKaxX6yy-Kzw1C-0/assets/pr-status-check-success.png?fit=max&auto=format&n=jKaxX6yy-Kzw1C-0&q=85&s=a74b568e364c0b068623bd31ee869361" className="rounded-xl w-full max-w-md mx-auto" alt="Pull Request Result" width="1408" height="690" data-path="assets/pr-status-check-success.png" />
  </Step>

  <Step title="Access Detailed Reports and Flamegraphs">
    After your benchmarks run in CI, head over to your CodSpeed dashboard to see
    detailed performance reports, historical trends, and flamegraph profiles for
    deeper analysis.

    <Frame caption="Profiling Report on CodSpeed">
      <img src="https://mintcdn.com/codspeed/CInbng288QuXBkrC/features/assets/cover.png?fit=max&auto=format&n=CInbng288QuXBkrC&q=85&s=302d47fea90881b1af8ab6c21148c245" className="p-4 w-full max-w-lg mx-auto" alt="Profiling Report on CodSpeed" width="1171" height="685" data-path="features/assets/cover.png" />
    </Frame>

    <Tip>
      Profiling works out of the box, no extra configuration needed!

      [Learn more about flamegraphs and how to use them to optimize your code](/features/profiling).
    </Tip>
  </Step>
</Steps>

<Info>
  **Using Bazel?**

  If you're using Bazel as your build system, check out the
  [Bazel integration documentation](/benchmarks/cpp#bazel) for detailed setup
  instructions with CodSpeed.
</Info>

## Next Steps

Check out these resources to continue your C++ benchmarking journey:

<CardGroup cols={2}>
  <Card title="Get Started with CodSpeed" href="https://codspeed.io?flow=get-started" icon="rocket">
    Sign up and start tracking your C++ performance in CI
  </Card>

  <Card title="CodSpeed C++ Benchmarking Docs" href="/benchmarks/cpp" icon={<CppIcon />}>
    Explore the full google\_benchmark API reference
  </Card>

  <Card title="Performance Profiling" href="/features/profiling" icon="fire">
    Learn how to use flamegraphs to optimize your code
  </Card>

  <Card title="Google Benchmark User Guide" href="https://google.github.io/benchmark/user_guide.html" icon="book">
    Explore all of google\_benchmark's features in depth
  </Card>
</CardGroup>
