> ## Documentation Index > Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt > Use this file to discover all available pages before exploring further. # How to Benchmark Go with the testing Package? > Learn how to measure the performance of your Go code using the standard testing package by writing and running benchmarks locally and continuously in CI to catch regressions. export const CIWorkflow = ({minimal = false, enableWorkflowDispatch = true, runsOn = "ubuntu-latest", highlight = [], mode, modes, submodules = false, preSteps = [], buildSteps = ["# ...", "# Setup your environment here:", "# - Configure your Python/Rust/Node version", "# - Install your dependencies", "# - Build your benchmarks (if using a compiled language)", "# ..."], benchmarkCommand = [""], jobName = "Run benchmarks", env = {}}) => { const modeList = modes || (mode ? [mode] : undefined); if (!modeList || modeList.length === 0) { throw new Error("mode or modes is required"); } const indent = (lines, depth) => { const reindentedLines = lines.map(l => l.length === 0 ? l : (" ").repeat(depth) + l); return reindentedLines.join("\n"); }; const workflowDispatchSection = enableWorkflowDispatch ? " # `workflow_dispatch` allows CodSpeed to trigger backtest\n" + " # performance analysis in order to generate initial data.\n" + " workflow_dispatch:\n" : ""; let yaml = ""; if (!minimal) { yaml += ` name: CodSpeed Benchmarks on: push: branches: - "main" # or "master" pull_request: `; yaml += workflowDispatchSection; } yaml += ` jobs: benchmarks: name: ${jobName} runs-on: ${runsOn}`; if (!minimal) { yaml += ` permissions: # optional for public repositories contents: read # required for actions/checkout id-token: write # required for OIDC authentication with CodSpeed`; } if (preSteps.length > 0) yaml += "\n" + indent(preSteps, 4); yaml += ` steps: - uses: actions/checkout@v5`; if (submodules) { const value = typeof submodules === "string" ? submodules : "true"; yaml += `\n with:\n submodules: ${value}`; } yaml += "\n" + indent(buildSteps, 6); const modeValue = modeList.join(","); yaml += ` - name: Run the benchmarks uses: CodSpeedHQ/action@v4 with: mode: ${modeValue}`; if (benchmarkCommand.length > 0) { const indentedBenchCommand = benchmarkCommand.length > 1 ? benchmarkCommand[0] + "\n" + indent(benchmarkCommand.slice(1), 12) : benchmarkCommand; const runLine = indent(["run: "], 10) + indentedBenchCommand; yaml += `\n${runLine}`; } const envEntries = Object.entries(env); if (envEntries.length > 0) { const envLines = ["env:", ...envEntries.map(([k, v]) => ` ${k}: ${v}`)]; yaml += "\n" + indent(envLines, 8); } return {yaml} ; }; export const TocConfig = ({hideBelow}) => { const ALL_LEVELS = ["h2", "h3", "h4"]; const HEADING_TO_DEPTH = { h2: "0", h3: "1", h4: "2" }; const cutoff = ALL_LEVELS.indexOf(hideBelow); if (cutoff === -1) return null; const hidden = ALL_LEVELS.slice(cutoff + 1); if (!hidden.length) return null; const selectors = hidden.map(level => `.toc-item[data-depth="${HEADING_TO_DEPTH[level]}"]`).join(",\n"); return ; }; ## Why the `testing` Package? Go has first-class benchmarking built into its standard library — no external framework needed. The `testing` package provides `testing.B`, which handles iteration control, timing, memory tracking, sub-benchmarks, and parallel execution out of the box. Every Go developer already has the tools installed. Benchmarks live in `_test.go` files alongside your code, run with `go test`, and integrate with the Go ecosystem's profiling tools (`pprof`, `benchstat`). ## Your First Benchmark Let's start with the simplest possible Go benchmark: measuring a recursive Fibonacci function. ### Setting Up Create a module and two files — the function and its benchmark: ```sh title=terminal icon="square-terminal" theme={null} mkdir my-benchmarks && cd my-benchmarks go mod init example.com/bench ``` ```go title=fib.go icon="golang" theme={null} package bench func Fibonacci(n int) int { if n <= 1 { return n } return Fibonacci(n-1) + Fibonacci(n-2) } ``` ```go title=fib_test.go icon="golang" theme={null} package bench import "testing" func BenchmarkFibonacci(b *testing.B) { for b.Loop() { Fibonacci(20) } } ``` A few things to note: * Benchmark functions must start with `Benchmark` and accept `*testing.B`. * `b.Loop()` (Go 1.24+) controls iteration. Iteration count and timing are handled automatically. * The function lives in a `_test.go` file, just like unit tests. `b.Loop()` was introduced in Go 1.24. It replaces the older `for i := 0; i < b.N; i++` pattern and is more precise — it automatically resets the timer, and the compiler is prevented from optimizing away the loop body. If you are on an older Go version, use the `b.N` pattern instead: ```go title="Legacy pattern (before Go 1.24)" icon="golang" theme={null} func BenchmarkFibonacci(b *testing.B) { for i := 0; i < b.N; i++ { Fibonacci(20) } } ``` ### Running the benchmarks ```shellsession title=terminal icon="square-terminal" highlight={6} theme={null} $ go test -bench=. goos: linux goarch: amd64 pkg: example.com/bench cpu: Intel(R) Xeon(R) Platinum 8488C BenchmarkFibonacci-8 38594 31076 ns/op PASS ok example.com/bench 1.207s ``` The highlighted line breaks down as follows: ```text theme={null} BenchmarkFibonacci-8 38594 31076 ns/op ^ ^ ^ | | | | | time per iteration | number of iterations benchmark name ``` The framework automatically adjusts the iteration count to run for at least 1 second by default. The `-8` suffix on `BenchmarkFibonacci-8` is the [`GOMAXPROCS` value](https://pkg.go.dev/runtime#GOMAXPROCS), defaulting to the number of available CPUs. ## Configuring Your Benchmarks ### Benchmark Duration Control how long each benchmark runs with `-benchtime`: ```sh title=terminal icon="square-terminal" theme={null} go test -bench=. -benchtime=5s ``` You can also specify an exact iteration count: ```sh title=terminal icon="square-terminal" theme={null} go test -bench=. -benchtime=1000x ``` ### Memory Allocation Tracking Add `-benchmem` to report allocation stats, or call `b.ReportAllocs()` inside the benchmark: ```sh title=terminal icon="square-terminal" theme={null} go test -bench=. -benchmem ``` ```shellsession title=terminal icon="square-terminal" theme={null} BenchmarkFibonacci-8 38594 31076 ns/op 0 B/op 0 allocs/op ``` The two extra columns show bytes allocated per operation and number of allocations per operation. These are essential for catching allocation regressions — even if latency stays flat, increased allocations put pressure on the garbage collector. ### Filtering and Skipping Tests Run only benchmarks (skip unit tests) with a regex: ```sh title=terminal icon="square-terminal" theme={null} go test -run='^$' -bench=. ``` Filter to specific benchmarks: ```sh title=terminal icon="square-terminal" theme={null} go test -run='^$' -bench=BenchmarkFibonacci ``` Run benchmarks in a specific package, or recursively across all packages: ```sh title=terminal icon="square-terminal" theme={null} go test -bench=. ./pkg/foo go test -bench=. ./... ``` ### Key CLI Flags Run benchmarks matching the regular expression. Use `-bench=.` for all. Minimum time per benchmark. Accepts a duration (`5s`, `100ms`) or an exact iteration count (`1000x`). Report memory allocation statistics (`B/op`, `allocs/op`). Run each benchmark n times. Use `-count=10` or higher for statistical analysis with `benchstat`. Comma-separated `GOMAXPROCS` values to test with (e.g., `-cpu=1,2,4,8`). Filter tests. Use `-run='^$'` to skip unit tests when benchmarking. Maximum total time for all tests and benchmarks. ## Sub-benchmarks and Table-Driven Patterns ### Sub-benchmarks with `b.Run` Use `b.Run()` to create sub-benchmarks — the standard way to test different inputs or configurations: ```go title=fib_test.go icon="golang" theme={null} func BenchmarkFibonacciSizes(b *testing.B) { sizes := []int{5, 10, 15, 20, 30} for _, n := range sizes { b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) { for b.Loop() { Fibonacci(n) } }) } } ``` ```shellsession title=terminal icon="square-terminal" theme={null} BenchmarkFibonacciSizes/n=5-8 58634750 21.27 ns/op BenchmarkFibonacciSizes/n=10-8 4858176 248.2 ns/op BenchmarkFibonacciSizes/n=15-8 431430 2781 ns/op BenchmarkFibonacciSizes/n=20-8 38892 30986 ns/op BenchmarkFibonacciSizes/n=30-8 312 3823289 ns/op ``` The exponential growth of recursive Fibonacci is clearly visible: n=5 takes 21ns, n=30 takes 3.8ms — a factor of 180,000x. You can filter sub-benchmarks from the command line: ```sh title=terminal icon="square-terminal" theme={null} go test -bench=BenchmarkFibonacciSizes/n=20 ``` ### Comparing Algorithms Sub-benchmarks make algorithm comparison straightforward: ```go title=fib_test.go icon="golang" theme={null} func FibonacciIterative(n int) int { if n <= 1 { return n } a, b := 0, 1 for i := 2; i <= n; i++ { a, b = b, a+b } return b } func BenchmarkAlgorithms(b *testing.B) { for _, n := range []int{10, 20, 30} { b.Run(fmt.Sprintf("recursive/n=%d", n), func(b *testing.B) { for b.Loop() { Fibonacci(n) } }) b.Run(fmt.Sprintf("iterative/n=%d", n), func(b *testing.B) { for b.Loop() { FibonacciIterative(n) } }) } } ``` ```shellsession title=terminal icon="square-terminal" theme={null} BenchmarkAlgorithms/recursive/n=10-8 4782566 251.9 ns/op BenchmarkAlgorithms/iterative/n=10-8 226538564 5.375 ns/op BenchmarkAlgorithms/recursive/n=20-8 38359 31287 ns/op BenchmarkAlgorithms/iterative/n=20-8 159598522 7.423 ns/op BenchmarkAlgorithms/recursive/n=30-8 313 3814719 ns/op BenchmarkAlgorithms/iterative/n=30-8 100000000 10.10 ns/op ``` At n=30, the iterative version is **377,000x faster** than the recursive one (10ns vs 3.8ms). The hierarchical sub-benchmark naming makes it easy to compare across both dimensions. ## Benchmarking Only What Matters ### Excluding Setup with `b.ResetTimer` When your benchmark has expensive one-time setup, use `b.ResetTimer()` to exclude it from measurements: ```go title="Excluding setup" icon="golang" theme={null} func BenchmarkProcess(b *testing.B) { data := expensiveSetup() // NOT measured b.ResetTimer() for i := 0; i < b.N; i++ { process(data) // MEASURED } } ``` With `b.Loop()` (Go 1.24+), the timer is automatically reset on the first iteration, so `b.ResetTimer()` is no longer required unless the setup happens inside the loop body. ### Per-Iteration Setup with Timer Control When each iteration needs fresh data (e.g., sorting an unsorted slice), use `b.StopTimer()` and `b.StartTimer()`: ```go title=sort_test.go icon="golang" theme={null} func BenchmarkSort(b *testing.B) { for _, size := range []int{100, 1000, 10000} { b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) { original := make([]int, size) for i := range original { original[i] = size - i // reverse-sorted = worst case } b.ResetTimer() for i := 0; i < b.N; i++ { b.StopTimer() data := make([]int, len(original)) copy(data, original) b.StartTimer() sort.Ints(data) } }) } } ``` ```shellsession title=terminal icon="square-terminal" theme={null} BenchmarkSort/size=100-8 5558869 214.7 ns/op BenchmarkSort/size=1000-8 1000000 1096 ns/op BenchmarkSort/size=10000-8 124218 9888 ns/op ``` `b.StopTimer()` and `b.StartTimer()` have overhead. If the per-iteration setup is extremely cheap relative to what you are measuring, the timer overhead may distort results. Use this pattern only when the setup cost is significant. ### Custom Metrics Report domain-specific metrics with `b.ReportMetric()`: ```go title="Custom metrics" icon="golang" theme={null} func BenchmarkCustomMetrics(b *testing.B) { var compares int64 for b.Loop() { s := []int{5, 4, 3, 2, 1} slices.SortFunc(s, func(a, b int) int { compares++ return cmp.Compare(a, b) }) } b.ReportMetric(float64(compares)/float64(b.N), "compares/op") } ``` Use `b.SetBytes(n)` to report throughput in MB/s for I/O-bound benchmarks: ```go title="Throughput reporting" icon="golang" theme={null} func BenchmarkRead(b *testing.B) { b.SetBytes(1024) // 1KB per operation for b.Loop() { readData(buf) } } // Output includes: 3125.00 MB/s ``` ## Parallel Benchmarks Use `b.RunParallel()` to benchmark code under concurrent load. This creates `GOMAXPROCS` goroutines and distributes iterations among them: ```go title=fib_test.go icon="golang" theme={null} func BenchmarkFibonacciParallel(b *testing.B) { b.RunParallel(func(pb *testing.PB) { for pb.Next() { Fibonacci(20) } }) } ``` ```shellsession title=terminal icon="square-terminal" theme={null} BenchmarkFibonacciParallel-8 211784 5754 ns/op ``` Compare this with the sequential result (31076 ns/op) — the parallel version runs \~5x faster on 8 cores, showing that this CPU-bound workload scales well across threads. Do not call `b.StopTimer()`, `b.StartTimer()`, or `b.ResetTimer()` inside `b.RunParallel`. They have global effect and will corrupt measurements. Each goroutine must maintain its own local state. Use `b.SetParallelism(n)` to increase concurrency beyond `GOMAXPROCS` for I/O-bound workloads: ```go title="High concurrency" icon="golang" theme={null} func BenchmarkHTTPConcurrent(b *testing.B) { b.SetParallelism(4) // 4 * GOMAXPROCS goroutines b.RunParallel(func(pb *testing.PB) { for pb.Next() { makeHTTPRequest() } }) } ``` ## Avoiding Common Pitfalls ### Compiler Dead Code Elimination If a computation's result is unused, the Go compiler may eliminate it entirely. This is the single most common source of misleading benchmark results. ```go title="Dead code elimination" icon="golang" theme={null} // BAD: result is unused — compiler may eliminate the call entirely func BenchmarkBroken(b *testing.B) { for i := 0; i < b.N; i++ { Fibonacci(20) // may report ~0 ns/op } } // GOOD: b.Loop() prevents the compiler from optimizing away the body func BenchmarkCorrect(b *testing.B) { for b.Loop() { Fibonacci(20) } } ``` If you must use the `b.N` pattern (Go \< 1.24), pass the result to `runtime.KeepAlive` so the compiler treats it as observed: ```go title="runtime.KeepAlive pattern" icon="golang" theme={null} func BenchmarkCorrect(b *testing.B) { var r int for i := 0; i < b.N; i++ { r = Fibonacci(20) } runtime.KeepAlive(r) } ``` ### Do Not Use `b.N` as Input Using `b.N` as a function parameter means the workload grows with the iteration count. The benchmark never converges and reports meaningless numbers: ```go title="b.N misuse" icon="golang" theme={null} // BAD: workload grows with b.N — benchmark never converges func BenchmarkBad(b *testing.B) { for i := 0; i < b.N; i++ { Fibonacci(i) // i grows, each iteration is slower } } // GOOD: fixed input func BenchmarkGood(b *testing.B) { for b.Loop() { Fibonacci(20) } } ``` ### Keep Benchmarks Deterministic Use fixed seeds for random data: ```go title="Deterministic setup" icon="golang" theme={null} func BenchmarkSort(b *testing.B) { rng := rand.New(rand.NewSource(42)) // fixed seed data := make([]int, 1000) for i := range data { data[i] = rng.Intn(1000) } b.ResetTimer() // ... } ``` ## Comparing Results with `benchstat` Raw benchmark numbers are noisy. Use [`benchstat`](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) to compare results with statistical rigor. ### Installation ```sh title=terminal icon="square-terminal" theme={null} go install golang.org/x/perf/cmd/benchstat@latest ``` ### Workflow Run benchmarks multiple times (at least 10) to collect enough samples: ```sh title=terminal icon="square-terminal" theme={null} go test -run='^$' -bench=. -count=10 > old.txt ``` Make your changes, then run again: ```sh title=terminal icon="square-terminal" theme={null} go test -run='^$' -bench=. -count=10 > new.txt ``` Compare with `benchstat`: ```sh title=terminal icon="square-terminal" theme={null} $ benchstat old.txt new.txt │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Fibonacci-8 30.96µ ± 0% 30.99µ ± 0% ~ (p=0.841 n=10) ``` The columns report: * **± 0%**: the 95% confidence interval. Lower means more stable results. * **\~ (p=0.841)**: no statistically significant difference. The p-value is from a [Mann-Whitney U-test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test); values below 0.05 indicate a real change. * **n=10**: the sample count. Use `-count=10` or higher for reliable results. ## Profiling with `pprof` Go benchmarks integrate directly with the `pprof` profiler. Generate profiles while benchmarking: Write CPU profile. Shows where time is spent. Write memory allocation profile. Shows where allocations happen. Write goroutine blocking profile. Shows where goroutines wait. Write mutex contention profile. Shows lock contention hotspots. Generate and analyze a CPU profile: ```sh title=terminal icon="square-terminal" theme={null} go test -run='^$' -bench=BenchmarkFibonacci -cpuprofile=cpu.prof go tool pprof -http=:8080 cpu.prof ``` This opens an interactive web UI with flamegraphs, call graphs, and source-level annotations. Collect only one profile type at a time for accuracy — profiling itself has overhead that can distort other measurements. ## Best Practices #### Run on Idle Machines Close background processes, avoid running on battery, and disable CPU throttling when collecting benchmark data. Noise from other processes can mask real performance changes. #### Use `-count` and `benchstat` for Decisions Never eyeball raw `ns/op` numbers to decide if a change helped. Run with `-count=10` and use `benchstat` to test for statistical significance. With \~20 benchmarks at alpha=0.05, expect \~1 false positive. #### Track Memory Alongside Latency Always use `-benchmem` or `b.ReportAllocs()`. Even if latency stays flat, increased allocations put pressure on the garbage collector and cause latency spikes under production load. #### Use Sub-Benchmarks for Input Variation Table-driven sub-benchmarks let you test across input sizes, data shapes, and configurations in a single benchmark function. They also enable filtering from the command line. ## Running Benchmarks Continuously with CodSpeed So far, you've been running benchmarks locally. But local benchmarking has limitations: * **Inconsistent hardware**: Different developers get different results * **Manual process**: Easy to forget to run benchmarks before merging * **No historical tracking**: Hard to spot gradual performance degradation * **No PR context**: Can't see performance impact during code review This is where **CodSpeed** comes in. It runs your benchmarks automatically in CI and provides: * Automated performance regression detection in PRs * Consistent metrics with reliable measurements across all runs * Historical tracking to see performance over time with detailed charts * Flamegraph profiles to see exactly what changed in your code's execution For the full CodSpeed integration reference, see [Writing Benchmarks in Go](/benchmarks/go). ### How to Set Up CodSpeed Here's how to integrate CodSpeed with your Go benchmarks: Create a workflow file to run benchmarks on every push and pull request. Once the workflow runs, your pull requests will receive a performance report comment: Pull Request Result

After your benchmarks run in CI, head over to your CodSpeed dashboard to see detailed performance reports, historical trends, and flamegraph profiles for deeper analysis. Profiling Report on CodSpeed

Profiling works out of the box, no extra configuration needed! [Learn more about flamegraphs and how to use them to optimize your code](/features/profiling). ## Next Steps Check out these resources to continue your Go benchmarking journey: Sign up and start tracking your Go performance in CI CodSpeed's Go integration reference and compatibility notes A hands-on guide to benchmarking a real HTTP API with Gin Learn how to use flamegraphs to optimize your code