> ## Documentation Index > Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt > Use this file to discover all available pages before exploring further. # How to Benchmark C++ with Google Benchmark? > Learn how to measure the performance of your C++ code by writing and running benchmarks locally and continuously in CI to catch regressions. export const CIWorkflow = ({minimal = false, enableWorkflowDispatch = true, runsOn = "ubuntu-latest", highlight = [], mode, modes, submodules = false, preSteps = [], buildSteps = ["# ...", "# Setup your environment here:", "# - Configure your Python/Rust/Node version", "# - Install your dependencies", "# - Build your benchmarks (if using a compiled language)", "# ..."], benchmarkCommand = [""], jobName = "Run benchmarks", env = {}}) => { const modeList = modes || (mode ? [mode] : undefined); if (!modeList || modeList.length === 0) { throw new Error("mode or modes is required"); } const indent = (lines, depth) => { const reindentedLines = lines.map(l => l.length === 0 ? l : (" ").repeat(depth) + l); return reindentedLines.join("\n"); }; const workflowDispatchSection = enableWorkflowDispatch ? " # `workflow_dispatch` allows CodSpeed to trigger backtest\n" + " # performance analysis in order to generate initial data.\n" + " workflow_dispatch:\n" : ""; let yaml = ""; if (!minimal) { yaml += ` name: CodSpeed Benchmarks on: push: branches: - "main" # or "master" pull_request: `; yaml += workflowDispatchSection; } yaml += ` jobs: benchmarks: name: ${jobName} runs-on: ${runsOn}`; if (!minimal) { yaml += ` permissions: # optional for public repositories contents: read # required for actions/checkout id-token: write # required for OIDC authentication with CodSpeed`; } if (preSteps.length > 0) yaml += "\n" + indent(preSteps, 4); yaml += ` steps: - uses: actions/checkout@v5`; if (submodules) { const value = typeof submodules === "string" ? submodules : "true"; yaml += `\n with:\n submodules: ${value}`; } yaml += "\n" + indent(buildSteps, 6); const modeValue = modeList.join(","); yaml += ` - name: Run the benchmarks uses: CodSpeedHQ/action@v4 with: mode: ${modeValue}`; if (benchmarkCommand.length > 0) { const indentedBenchCommand = benchmarkCommand.length > 1 ? benchmarkCommand[0] + "\n" + indent(benchmarkCommand.slice(1), 12) : benchmarkCommand; const runLine = indent(["run: "], 10) + indentedBenchCommand; yaml += `\n${runLine}`; } const envEntries = Object.entries(env); if (envEntries.length > 0) { const envLines = ["env:", ...envEntries.map(([k, v]) => ` ${k}: ${v}`)]; yaml += "\n" + indent(envLines, 8); } return {yaml} ; }; export const CppIcon = props => ; ## Choosing our Benchmarking Strategy We are going to use [`google_benchmark`](https://github.com/google/benchmark), the standard C++ benchmarking library maintained by Google. It's widely adopted across the C++ ecosystem, supports fixtures and parameterized benchmarks with statistical analysis, and works with CMake, Bazel, and other build systems. This guide uses [CMake](https://cmake.org/) as the build system. If you're using [Bazel](https://bazel.build/), check out the [Bazel integration documentation](/benchmarks/cpp#bazel) for build instructions. ## Your First Benchmark Let's start by creating a benchmark for a recursive Fibonacci function to see how we can measure computational performance. ### Project Setup First, create a basic project structure: ```bash icon="square-terminal" theme={null} mkdir my_project && cd my_project mkdir benchmarks ``` ### Writing the Benchmark Create a new file `benchmarks/main.cpp`: ```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} #include // Recursive Fibonacci function to benchmark static long long fibonacci(int n) { if (n <= 1) return n; return fibonacci(n - 1) + fibonacci(n - 2); } // Define the benchmark static void BM_Fibonacci(benchmark::State &state) { // Use a volatile variable to prevent compile-time optimization volatile int n = 30; // This loop runs multiple times to get accurate measurements for (auto _ : state) { // Prevent compiler from optimizing away the computation auto result = fibonacci(n); benchmark::DoNotOptimize(result); } } // Register the benchmark, specifying the time unit as milliseconds for better // readability BENCHMARK(BM_Fibonacci)->Unit(benchmark::kMillisecond); // Entrypoint that runs all registered benchmarks BENCHMARK_MAIN(); ``` A few things to note: * `volatile int n = 30` prevents the compiler from computing the result at compile time * `benchmark::State& state` provides the benchmark loop that runs your code multiple times * `for (auto _ : state)` is where your actual benchmark code goes - this loop is timed * `benchmark::DoNotOptimize()` prevents the compiler from optimizing away the result * `BENCHMARK()` registers your function as a benchmark * `->Unit(benchmark::kMillisecond)` displays results in milliseconds for better readability as by default it's in nanoseconds * `BENCHMARK_MAIN()` provides the entry point that discovers and runs all benchmarks To learn more about preventing compiler optimizations, check out the [Prevent Compiler Optimizations](#prevent-compiler-optimizations) section below. ### Configuration with CMake Create a `CMakeLists.txt` file in the `benchmarks/` folder: ```cmake benchmarks/CMakeLists.txt theme={null} cmake_minimum_required(VERSION 3.14) project(my_benchmarks VERSION 0.1.0 LANGUAGES CXX) # Use C++17 (or your preferred version) set(CMAKE_CXX_STANDARD 17) set(CMAKE_CXX_STANDARD_REQUIRED ON) # Enable optimizations with debug symbols for profiling set(CMAKE_BUILD_TYPE RelWithDebInfo) # Fetch google_benchmark from CodSpeed's repository include(FetchContent) FetchContent_Declare( google_benchmark GIT_REPOSITORY https://github.com/CodSpeedHQ/codspeed-cpp SOURCE_SUBDIR google_benchmark GIT_TAG main ) set(BENCHMARK_DOWNLOAD_DEPENDENCIES ON) FetchContent_MakeAvailable(google_benchmark) # Create the benchmark executable add_executable(bench main.cpp) # Link against google_benchmark target_link_libraries(bench benchmark::benchmark) ``` Key configuration points: * `CMAKE_BUILD_TYPE RelWithDebInfo` enables optimizations with debug symbols for accurate profiling * We use CodSpeed's fork of `google_benchmark` which adds performance measurement capabilities and CI integration * `BENCHMARK_DOWNLOAD_DEPENDENCIES ON` allows google\_benchmark to download its dependencies ### Building and Running the Benchmark Build your benchmark: ```bash icon="square-terminal" theme={null} cd benchmarks mkdir build && cd build cmake .. make ``` You should see output like: ```shellsession title=terminal icon="square-terminal" theme={null} -- The CXX compiler identification is GNU 14.2.1 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Configuring done (8.6s) -- Generating done (0.1s) -- Build files have been written to: /home/user/my_project/benchmarks/build [ 1%] Building CXX object ... ... [100%] Built target bench ``` Now run your benchmark: ```bash icon="square-terminal" theme={null} ./bench ``` You should see output like this: ```shellsession title=terminal icon="square-terminal" theme={null} 2025-12-01T17:24:27+01:00 Running ./bench Run on (8 X 24 MHz CPU s) CPU Caches: L1 Data 64 KiB L1 Instruction 128 KiB L2 Unified 4096 KiB (x8) Load Average: 8.47, 7.96, 7.04 ------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------- BM_Fibonacci 2.74 ms 2.65 ms 271 ``` Congratulations! You've created your first C++ benchmark. The output shows that computing `fibonacci(30)` takes about 2.74 milliseconds on average. **Understanding the results:** * **Time**: Wall-clock time per iteration (lower is better) * **CPU**: CPU time per iteration (accounts for multi-threading) * **Iterations**: How many times the benchmark ran to get reliable measurements ## Benchmarking with Parameters So far, we've only tested our function with a single input (n=30). But what if we want to see how performance changes with different input sizes? This is where `DenseRange` comes in. Let's add a parameterized benchmark to test Fibonacci with various input sizes. Update your `main.cpp` to include: ```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} // Define the benchmark with a parameter static void BM_Fibonacci_DenseRange(benchmark::State &state) { // Get the input value from the benchmark parameter volatile int n = state.range(0); for (auto _ : state) { auto result = fibonacci(n); benchmark::DoNotOptimize(result); } } // Test Fibonacci with inputs from 15 to 35 in steps of 5 BENCHMARK(BM_Fibonacci_DenseRange) ->DenseRange(15, 35, 5) // Test inputs 15, 20, 25, 30, 35 ->Unit(benchmark::kMillisecond); ``` Now `state.range(0)` gives us the input parameter, and `DenseRange(15, 35, 5)` tells the benchmark to run with inputs 15, 20, 25, 30, and 35. Rebuild and run: ```bash icon="square-terminal" theme={null} make ./bench --benchmark_filter=Fibonacci_DenseRange ``` We used the `--benchmark_filter` flag to only run benchmarks matching `Fibonacci_DenseRange`. This is useful when you have many benchmarks and want to focus on a subset. Learn more about [benchmark a subset of benchmarks](https://google.github.io/benchmark/user_guide.html#running-a-subset-of-benchmarks). You should see output like: ```shellsession title=terminal icon="square-terminal" theme={null} --------------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------------- BM_Fibonacci_DenseRange/15 0.002 ms 0.002 ms 380948 BM_Fibonacci_DenseRange/20 0.022 ms 0.021 ms 33413 BM_Fibonacci_DenseRange/25 0.276 ms 0.234 ms 3050 BM_Fibonacci_DenseRange/30 2.62 ms 2.59 ms 278 BM_Fibonacci_DenseRange/35 28.1 ms 28.0 ms 25 ``` Notice how the execution time grows exponentially with the input size, clearly demonstrating the O(2^n) complexity of the recursive Fibonacci algorithm. This is the power of parameterized benchmarks – they help you understand how your code scales with different inputs. #### Multiple Arguments What if your function takes multiple parameters? For example, let's benchmark the performance of `std::string::find()` with varying text and pattern sizes. Let's add a new benchmark to `main.cpp`: ```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} // ... (previous code) ... #include static void BM_StringFind(benchmark::State& state) { size_t string_size = state.range(0); size_t pattern_size = state.range(1); // Setup std::string text(string_size, 'a'); std::string pattern(pattern_size, 'b'); // Place pattern near the end for worst-case scenario text.replace(string_size - pattern_size, pattern_size, pattern); // Benchmark for (auto _ : state) { auto pos = text.find(pattern); benchmark::DoNotOptimize(pos); } } // Benchmark different combinations of text and pattern sizes using ArgsProduct BENCHMARK(BM_StringFind) ->ArgsProduct({ {1000, 10000, 100000}, // Text sizes {50, 500} // Pattern sizes }); ``` The `ArgsProduct()` function creates benchmarks for all combinations of the provided argument lists. In this case, it generates 6 benchmarks (3 text sizes × 2 pattern sizes), letting you analyze how both parameters affect performance. Here is the output when you run this benchmark: ```shellsession title=terminal icon="square-terminal" theme={null} ./bench --benchmark_filter=StringFind ... ------------------------------------------------------------------- Benchmark Time CPU Iterations ------------------------------------------------------------------- BM_StringFind/1000/50 28.7 ns 28.0 ns 25077651 BM_StringFind/10000/50 337 ns 237 ns 3123341 BM_StringFind/100000/50 2157 ns 2066 ns 287731 BM_StringFind/1000/500 30.3 ns 28.6 ns 24820407 BM_StringFind/10000/500 248 ns 243 ns 2987100 BM_StringFind/100000/500 2075 ns 2031 ns 348384 ``` There are more ways to define parameterized benchmarks, check out the [`google_benchmark` documentation on parameterized benchmarks](https://google.github.io/benchmark/user_guide.html#passing-arguments). ## Benchmarking Only What Matters Sometimes you have expensive setup that shouldn't be included in your benchmark measurements. For example, loading data from a file or creating large data structures. Google Benchmark provides several ways to handle this. ### Fresh Setup per Iteration Let's benchmark a sorting algorithm where we need fresh data for each iteration. We do not want the data generation time to be included in the benchmark. We can exclude it using `PauseTiming()` and `ResumeTiming()`: ```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} // ... (previous code) ... #include #include #include static void BM_SortVector(benchmark::State &state) { size_t size = state.range(0); std::mt19937 gen(42); // Fixed seed for reproducibility for (auto _ : state) { // Pause timing during setup state.PauseTiming(); // Generate random data (NOT measured) std::vector data(size); std::uniform_int_distribution<> dis(1, 10000); for (size_t i = 0; i < size; ++i) { data[i] = dis(gen); } // Resume timing for the actual work state.ResumeTiming(); // Sort the vector (MEASURED) std::sort(data.begin(), data.end()); benchmark::DoNotOptimize(data.data()); benchmark::ClobberMemory(); } } BENCHMARK(BM_SortVector)->Range(100, 100000)->Unit(benchmark::kMicrosecond); ``` The setup code (generating random data) runs before each iteration but isn't included in the timing. Only the `std::sort()` call is measured. **Use PauseTiming/ResumeTiming sparingly** While `PauseTiming()` and `ResumeTiming()` are useful, they add overhead to your benchmarks. If your setup can be done once before all iterations (like loading a file), use fixtures instead (see next section) for better performance and cleaner code. ### Shared Setup for All Iterations When you can reuse the same data across iterations, fixtures are more efficient. They are a class that defines a setup and teardown process that runs once for all iterations. Both of these methods are not included in the timing. Here is an example where we set up a sorted vector once for all iterations and benchmark binary search on it: ```cpp benchmarks/main.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} // Define a fixture class that sets up a random vector for searching class VectorFixture : public benchmark::Fixture { public: std::vector data; // Setup runs once before all iterations void SetUp(const ::benchmark::State &state) { size_t size = state.range(0); std::mt19937 gen(42); // Fixed seed for reproducibility std::uniform_int_distribution<> dis(1, size); data.resize(size); for (size_t i = 0; i < size; ++i) { data[i] = dis(gen); } std::sort(data.begin(), data.end()); } // TearDown runs once after all iterations void TearDown(const ::benchmark::State &) { data.clear(); } }; // Define the BinarySearch benchmark using VectorFixture BENCHMARK_DEFINE_F(VectorFixture, BinarySearch)(benchmark::State &state) { int target = data.size() / 2; for (auto _ : state) { // Only this is measured bool found = std::binary_search(data.begin(), data.end(), target); benchmark::DoNotOptimize(found); } } // Register the fixture benchmark with different vector sizes BENCHMARK_REGISTER_F(VectorFixture, BinarySearch)->Range(1000, 100000); ``` In this example, the `SetUp()` method initializes a sorted vector once before all iterations, and `TearDown()` cleans up afterward. The benchmark only measures the `std::binary_search()` calls. Fixtures use different macros: `BENCHMARK_DEFINE_F` to define and `BENCHMARK_REGISTER_F` to register with parameters. ## Best Practices ### Prevent Compiler Optimizations The C++ compiler is extremely aggressive with optimizations. Always protect your benchmarks: ```cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} // ❌ BAD: Compiler might optimize everything away static void BM_Bad(benchmark::State& state) { for (auto _ : state) { int x = 42; int y = x * 2; // Compiler knows this is 84 at compile time } } // ✅ GOOD: Use DoNotOptimize for values static void BM_Good(benchmark::State& state) { for (auto _ : state) { int x = 42; benchmark::DoNotOptimize(x); int y = x * 2; benchmark::DoNotOptimize(y); } } // ✅ BETTER: Use DoNotOptimize and ClobberMemory static void BM_Better(benchmark::State& state) { for (auto _ : state) { int x = 42; benchmark::DoNotOptimize(x); int y = x * 2; benchmark::DoNotOptimize(y); benchmark::ClobberMemory(); } } ``` **Important**: Always use `benchmark::DoNotOptimize()` to prevent the compiler from optimizing away your benchmarks. Without it, the compiler might eliminate the code you're trying to measure, giving you inaccurate results. **Understanding DoNotOptimize vs ClobberMemory:** * `DoNotOptimize(value)` forces the result of a computation to be stored in memory or a register, preventing the compiler from eliminating the computation entirely * `ClobberMemory()` forces the compiler to flush all pending writes to memory, preventing operations with memory side effects from being optimized away * Use `DoNotOptimize()` for return values and computed results * Add `ClobberMemory()` when benchmarking operations that modify memory (like filling vectors or copying data) Learn more in the [Google Benchmark guide on preventing optimization](https://google.github.io/benchmark/user_guide.html#preventing-optimization). ### Keep Benchmarks Deterministic Use fixed seeds for random number generators: ```cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} // ❌ BAD: Non-deterministic results static void BM_NonDeterministic(benchmark::State& state) { std::random_device rd; std::mt19937 gen(rd()); // Different every run! for (auto _ : state) { // ... } } // ✅ GOOD: Deterministic with fixed seed static void BM_Deterministic(benchmark::State& state) { std::mt19937 gen(42); // Fixed seed for (auto _ : state) { // ... } } ``` ### Benchmark Real-World Code In real projects, you'll benchmark functions from your library. Here's a typical structure for a C++ project with benchmarks: ```shellsession title=terminal icon="square-terminal" theme={null} my_project/ ├── CMakeLists.txt ├── include/ │ └── mylib/ │ └── algorithms.hpp ├── src/ │ └── algorithms.cpp └── benchmarks/ └── bench_algorithms.cpp ``` The header `include/mylib/algorithms.hpp` defines your library's API: ```cpp include/mylib/algorithms.hpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} #pragma once #include namespace mylib { std::vector bubble_sort(std::vector arr); } // namespace mylib ``` The implementation `src/algorithms.cpp` contains the actual algorithm: ```cpp src/algorithms.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} #include "mylib/algorithms.hpp" namespace mylib { std::vector bubble_sort(std::vector arr) { size_t n = arr.size(); for (size_t i = 0; i < n; ++i) { for (size_t j = 0; j < n - 1 - i; ++j) { if (arr[j] > arr[j + 1]) { std::swap(arr[j], arr[j + 1]); } } } return arr; } } // namespace mylib ``` The benchmark `benchmarks/bench_algorithms.cpp` tests the bubble sort function: ```cpp benchmarks/bench_algorithms.cpp icon="https://mintcdn.com/codspeed/GDLcp8Ny8u4pFbNX/assets/icons/cpp.svg?fit=max&auto=format&n=GDLcp8Ny8u4pFbNX&q=85&s=420e72f7613b61e7f1961ccdd2e4b9bb" theme={null} #include "mylib/algorithms.hpp" #include #include // Define a fixture class that sets up random data for sorting class SortFixture : public benchmark::Fixture { public: std::vector original_data; // Setup runs once before all iterations void SetUp(const ::benchmark::State &state) { size_t size = state.range(0); std::mt19937 gen(42); // Fixed seed for reproducibility std::uniform_int_distribution<> dis(1, size); original_data.resize(size); for (size_t i = 0; i < size; ++i) { original_data[i] = dis(gen); } } // TearDown runs once after all iterations void TearDown(const ::benchmark::State &) { original_data.clear(); } }; // Define the BubbleSort benchmark using SortFixture BENCHMARK_DEFINE_F(SortFixture, BubbleSort)(benchmark::State &state) { for (auto _ : state) { // Make a copy of the original data for each iteration // Only the sorting is measured, not the copy state.PauseTiming(); std::vector data = original_data; state.ResumeTiming(); auto sorted = mylib::bubble_sort(data); benchmark::DoNotOptimize(sorted.data()); benchmark::ClobberMemory(); } } // Register the fixture benchmark with different data sizes BENCHMARK_REGISTER_F(SortFixture, BubbleSort) ->Range(1000, 100000) ->Unit(benchmark::kMillisecond); BENCHMARK_MAIN(); ``` Update your `CMakeLists.txt` to build both your library and benchmarks: ```cmake CMakeLists.txt theme={null} cmake_minimum_required(VERSION 3.14) project(mylib VERSION 0.1.0 LANGUAGES CXX) set(CMAKE_CXX_STANDARD 17) set(CMAKE_CXX_STANDARD_REQUIRED ON) # Enable optimizations with debug symbols for profiling set(CMAKE_BUILD_TYPE RelWithDebInfo) # Your library add_library(mylib src/algorithms.cpp) target_include_directories(mylib PUBLIC include) # Fetch google_benchmark include(FetchContent) FetchContent_Declare( google_benchmark GIT_REPOSITORY https://github.com/CodSpeedHQ/codspeed-cpp SOURCE_SUBDIR google_benchmark GIT_TAG main ) set(BENCHMARK_DOWNLOAD_DEPENDENCIES ON) FetchContent_MakeAvailable(google_benchmark) # Benchmark executable add_executable(bench_algorithms benchmarks/bench_algorithms.cpp) target_link_libraries(bench_algorithms mylib benchmark::benchmark) ``` You can now build and run your benchmarks with the following commands: ```bash icon="square-terminal" theme={null} mkdir build && cd build cmake .. make ./bench_algorithms ``` This will yield an output similar to: ```shellsession title=terminal icon="square-terminal" theme={null} 2025-12-02T16:50:44+01:00 Running ./bench_algorithms Run on (8 X 24 MHz CPU s) CPU Caches: L1 Data 64 KiB L1 Instruction 128 KiB L2 Unified 4096 KiB (x8) Load Average: 9.83, 10.83, 8.99 ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ SortFixture/BubbleSort/1000 0.381 ms 0.321 ms 2219 SortFixture/BubbleSort/4096 5.80 ms 4.97 ms 136 SortFixture/BubbleSort/32768 732 ms 718 ms 1 SortFixture/BubbleSort/100000 10848 ms 9529 ms 1 ``` ## Running Benchmarks Continuously with CodSpeed So far, you've been running benchmarks locally. But local benchmarking has limitations: * **Inconsistent hardware**: Different developers get different results * **Manual process**: Easy to forget to run benchmarks before merging * **No historical tracking**: Hard to spot gradual performance degradation * **No PR context**: Can't see performance impact during code review This is where **CodSpeed** comes in. It runs your benchmarks automatically in CI and provides: * Automated performance regression detection in PRs * Consistent metrics with reliable measurements across all runs * Historical tracking to see performance over time with detailed charts * Flamegraph profiles to see exactly what changed in your code's execution For the full CodSpeed integration reference, see [Writing Benchmarks in C++](/benchmarks/cpp). ### How to set up CodSpeed with google\_benchmark Here's how to integrate CodSpeed with your `google_benchmark` benchmarks using CMake: CodSpeed provides a special build mode that instruments your benchmarks for performance tracking. This is controlled with the `CODSPEED_MODE` CMake flag, which can be set to: * `off`: (default) Regular benchmarking without CodSpeed * `simulation`: CodSpeed CPU simulation mode for CI * `walltime`: Walltime measurements (see [walltime docs](/instruments/walltime)) Build your benchmarks with CodSpeed mode enabled: ```bash icon="square-terminal" theme={null} cd benchmarks mkdir build && cd build cmake -DCODSPEED_MODE=simulation .. make ``` Run the benchmarks to verify everything works: ```bash icon="square-terminal" theme={null} ./bench_algorithms ``` You should see output indicating CodSpeed is enabled: ```shellsession title=terminal icon="square-terminal" theme={null} Codspeed mode: simulation 2025-12-02T17:21:57+01:00 Running ./bench_algorithms Run on (8 X 24 MHz CPU s) CPU Caches: L1 Data 64 KiB L1 Instruction 128 KiB L2 Unified 4096 KiB (x8) Load Average: 9.22, 7.26, 6.71 NOTICE: codspeed is enabled, but no performance measurement will be made since it's running in an unknown environment. Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][1000] Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][4096] Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][32768] Checked: cpp/benchmarks/bench_algorithms.cpp::BubbleSort[SortFixture][100000] ``` Notice there are no timing measurements in the local output. CodSpeed only captures actual performance data when running in CI. Create a workflow file to run benchmarks on every push and pull request: Once the workflow runs, your pull requests will receive a performance report comment: Pull Request Result

After your benchmarks run in CI, head over to your CodSpeed dashboard to see detailed performance reports, historical trends, and flamegraph profiles for deeper analysis. Profiling Report on CodSpeed

Profiling works out of the box, no extra configuration needed! [Learn more about flamegraphs and how to use them to optimize your code](/features/profiling). **Using Bazel?** If you're using Bazel as your build system, check out the [Bazel integration documentation](/benchmarks/cpp#bazel) for detailed setup instructions with CodSpeed. ## Next Steps Check out these resources to continue your C++ benchmarking journey: Sign up and start tracking your C++ performance in CI }> Explore the full google\_benchmark API reference Learn how to use flamegraphs to optimize your code Explore all of google\_benchmark's features in depth