Skip to main content

Documentation Index

Fetch the complete documentation index at: https://codspeed.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

Why the testing Package?

Go has first-class benchmarking built into its standard library — no external framework needed. The testing package provides testing.B, which handles iteration control, timing, memory tracking, sub-benchmarks, and parallel execution out of the box. Every Go developer already has the tools installed. Benchmarks live in _test.go files alongside your code, run with go test, and integrate with the Go ecosystem’s profiling tools (pprof, benchstat).

Your First Benchmark

Let’s start with the simplest possible Go benchmark: measuring a recursive Fibonacci function.

Setting Up

Create a module and two files — the function and its benchmark:
terminal
mkdir my-benchmarks && cd my-benchmarks
go mod init example.com/bench
fib.go
package bench

func Fibonacci(n int) int {
    if n <= 1 {
        return n
    }
    return Fibonacci(n-1) + Fibonacci(n-2)
}
fib_test.go
package bench

import "testing"

func BenchmarkFibonacci(b *testing.B) {
    for b.Loop() {
        Fibonacci(20)
    }
}
A few things to note:
  • Benchmark functions must start with Benchmark and accept *testing.B.
  • b.Loop() (Go 1.24+) controls iteration. Iteration count and timing are handled automatically.
  • The function lives in a _test.go file, just like unit tests.
b.Loop() was introduced in Go 1.24. It replaces the older for i := 0; i < b.N; i++ pattern and is more precise — it automatically resets the timer, and the compiler is prevented from optimizing away the loop body. If you are on an older Go version, use the b.N pattern instead:
Legacy pattern (before Go 1.24)
func BenchmarkFibonacci(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Fibonacci(20)
    }
}

Running the benchmarks

terminal
$ go test -bench=.
goos: linux
goarch: amd64
pkg: example.com/bench
cpu: Intel(R) Xeon(R) Platinum 8488C
BenchmarkFibonacci-8   	   38594	     31076 ns/op
PASS
ok  	example.com/bench	1.207s
The highlighted line breaks down as follows:
BenchmarkFibonacci-8         38594        31076 ns/op
        ^                     ^            ^
        |                     |            |
        |                     |     time per iteration
        |              number of iterations
    benchmark name
The framework automatically adjusts the iteration count to run for at least 1 second by default. The -8 suffix on BenchmarkFibonacci-8 is the GOMAXPROCS value, defaulting to the number of available CPUs.

Configuring Your Benchmarks

Benchmark Duration

Control how long each benchmark runs with -benchtime:
terminal
go test -bench=. -benchtime=5s
You can also specify an exact iteration count:
terminal
go test -bench=. -benchtime=1000x

Memory Allocation Tracking

Add -benchmem to report allocation stats, or call b.ReportAllocs() inside the benchmark:
terminal
go test -bench=. -benchmem
terminal
BenchmarkFibonacci-8    38594    31076 ns/op    0 B/op    0 allocs/op
The two extra columns show bytes allocated per operation and number of allocations per operation. These are essential for catching allocation regressions — even if latency stays flat, increased allocations put pressure on the garbage collector.

Filtering and Skipping Tests

Run only benchmarks (skip unit tests) with a regex:
terminal
go test -run='^$' -bench=.
Filter to specific benchmarks:
terminal
go test -run='^$' -bench=BenchmarkFibonacci
Run benchmarks in a specific package, or recursively across all packages:
terminal
go test -bench=. ./pkg/foo
go test -bench=. ./...

Key CLI Flags

-bench
regexp
required
Run benchmarks matching the regular expression. Use -bench=. for all.
-benchtime
duration
default:"1s"
Minimum time per benchmark. Accepts a duration (5s, 100ms) or an exact iteration count (1000x).
-benchmem
bool
default:"false"
Report memory allocation statistics (B/op, allocs/op).
-count
int
default:"1"
Run each benchmark n times. Use -count=10 or higher for statistical analysis with benchstat.
-cpu
list
Comma-separated GOMAXPROCS values to test with (e.g., -cpu=1,2,4,8).
-run
regexp
Filter tests. Use -run='^$' to skip unit tests when benchmarking.
-timeout
duration
default:"10m"
Maximum total time for all tests and benchmarks.

Sub-benchmarks and Table-Driven Patterns

Sub-benchmarks with b.Run

Use b.Run() to create sub-benchmarks — the standard way to test different inputs or configurations:
fib_test.go
func BenchmarkFibonacciSizes(b *testing.B) {
    sizes := []int{5, 10, 15, 20, 30}
    for _, n := range sizes {
        b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
            for b.Loop() {
                Fibonacci(n)
            }
        })
    }
}
terminal
BenchmarkFibonacciSizes/n=5-8  	58634750	     21.27 ns/op
BenchmarkFibonacciSizes/n=10-8 	 4858176	       248.2 ns/op
BenchmarkFibonacciSizes/n=15-8 	  431430	      2781 ns/op
BenchmarkFibonacciSizes/n=20-8 	   38892	     30986 ns/op
BenchmarkFibonacciSizes/n=30-8 	     312	   3823289 ns/op
The exponential growth of recursive Fibonacci is clearly visible: n=5 takes 21ns, n=30 takes 3.8ms — a factor of 180,000x. You can filter sub-benchmarks from the command line:
terminal
go test -bench=BenchmarkFibonacciSizes/n=20

Comparing Algorithms

Sub-benchmarks make algorithm comparison straightforward:
fib_test.go
func FibonacciIterative(n int) int {
    if n <= 1 {
        return n
    }
    a, b := 0, 1
    for i := 2; i <= n; i++ {
        a, b = b, a+b
    }
    return b
}

func BenchmarkAlgorithms(b *testing.B) {
    for _, n := range []int{10, 20, 30} {
        b.Run(fmt.Sprintf("recursive/n=%d", n), func(b *testing.B) {
            for b.Loop() {
                Fibonacci(n)
            }
        })
        b.Run(fmt.Sprintf("iterative/n=%d", n), func(b *testing.B) {
            for b.Loop() {
                FibonacciIterative(n)
            }
        })
    }
}
terminal
BenchmarkAlgorithms/recursive/n=10-8   	 4782566	     251.9 ns/op
BenchmarkAlgorithms/iterative/n=10-8   	226538564	       5.375 ns/op
BenchmarkAlgorithms/recursive/n=20-8   	   38359	     31287 ns/op
BenchmarkAlgorithms/iterative/n=20-8   	159598522	       7.423 ns/op
BenchmarkAlgorithms/recursive/n=30-8   	     313	   3814719 ns/op
BenchmarkAlgorithms/iterative/n=30-8   	100000000	      10.10 ns/op
At n=30, the iterative version is 377,000x faster than the recursive one (10ns vs 3.8ms). The hierarchical sub-benchmark naming makes it easy to compare across both dimensions.

Benchmarking Only What Matters

Excluding Setup with b.ResetTimer

When your benchmark has expensive one-time setup, use b.ResetTimer() to exclude it from measurements:
Excluding setup
func BenchmarkProcess(b *testing.B) {
    data := expensiveSetup() // NOT measured
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        process(data) // MEASURED
    }
}
With b.Loop() (Go 1.24+), the timer is automatically reset on the first iteration, so b.ResetTimer() is no longer required unless the setup happens inside the loop body.

Per-Iteration Setup with Timer Control

When each iteration needs fresh data (e.g., sorting an unsorted slice), use b.StopTimer() and b.StartTimer():
sort_test.go
func BenchmarkSort(b *testing.B) {
    for _, size := range []int{100, 1000, 10000} {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            original := make([]int, size)
            for i := range original {
                original[i] = size - i // reverse-sorted = worst case
            }
            b.ResetTimer()
            for i := 0; i < b.N; i++ {
                b.StopTimer()
                data := make([]int, len(original))
                copy(data, original)
                b.StartTimer()
                sort.Ints(data)
            }
        })
    }
}
terminal
BenchmarkSort/size=100-8      	 5558869	     214.7 ns/op
BenchmarkSort/size=1000-8     	 1000000	      1096 ns/op
BenchmarkSort/size=10000-8    	  124218	      9888 ns/op
b.StopTimer() and b.StartTimer() have overhead. If the per-iteration setup is extremely cheap relative to what you are measuring, the timer overhead may distort results. Use this pattern only when the setup cost is significant.

Custom Metrics

Report domain-specific metrics with b.ReportMetric():
Custom metrics
func BenchmarkCustomMetrics(b *testing.B) {
    var compares int64
    for b.Loop() {
        s := []int{5, 4, 3, 2, 1}
        slices.SortFunc(s, func(a, b int) int {
            compares++
            return cmp.Compare(a, b)
        })
    }
    b.ReportMetric(float64(compares)/float64(b.N), "compares/op")
}
Use b.SetBytes(n) to report throughput in MB/s for I/O-bound benchmarks:
Throughput reporting
func BenchmarkRead(b *testing.B) {
    b.SetBytes(1024) // 1KB per operation
    for b.Loop() {
        readData(buf)
    }
}
// Output includes: 3125.00 MB/s

Parallel Benchmarks

Use b.RunParallel() to benchmark code under concurrent load. This creates GOMAXPROCS goroutines and distributes iterations among them:
fib_test.go
func BenchmarkFibonacciParallel(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            Fibonacci(20)
        }
    })
}
terminal
BenchmarkFibonacciParallel-8   	  211784	     5754 ns/op
Compare this with the sequential result (31076 ns/op) — the parallel version runs ~5x faster on 8 cores, showing that this CPU-bound workload scales well across threads.
Do not call b.StopTimer(), b.StartTimer(), or b.ResetTimer() inside b.RunParallel. They have global effect and will corrupt measurements. Each goroutine must maintain its own local state.
Use b.SetParallelism(n) to increase concurrency beyond GOMAXPROCS for I/O-bound workloads:
High concurrency
func BenchmarkHTTPConcurrent(b *testing.B) {
    b.SetParallelism(4) // 4 * GOMAXPROCS goroutines
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            makeHTTPRequest()
        }
    })
}

Avoiding Common Pitfalls

Compiler Dead Code Elimination

If a computation’s result is unused, the Go compiler may eliminate it entirely. This is the single most common source of misleading benchmark results.
Dead code elimination
// BAD: result is unused — compiler may eliminate the call entirely
func BenchmarkBroken(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Fibonacci(20) // may report ~0 ns/op
    }
}

// GOOD: b.Loop() prevents the compiler from optimizing away the body
func BenchmarkCorrect(b *testing.B) {
    for b.Loop() {
        Fibonacci(20)
    }
}
If you must use the b.N pattern (Go < 1.24), pass the result to runtime.KeepAlive so the compiler treats it as observed:
runtime.KeepAlive pattern
func BenchmarkCorrect(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = Fibonacci(20)
    }
    runtime.KeepAlive(r)
}

Do Not Use b.N as Input

Using b.N as a function parameter means the workload grows with the iteration count. The benchmark never converges and reports meaningless numbers:
b.N misuse
// BAD: workload grows with b.N — benchmark never converges
func BenchmarkBad(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Fibonacci(i) // i grows, each iteration is slower
    }
}

// GOOD: fixed input
func BenchmarkGood(b *testing.B) {
    for b.Loop() {
        Fibonacci(20)
    }
}

Keep Benchmarks Deterministic

Use fixed seeds for random data:
Deterministic setup
func BenchmarkSort(b *testing.B) {
    rng := rand.New(rand.NewSource(42)) // fixed seed
    data := make([]int, 1000)
    for i := range data {
        data[i] = rng.Intn(1000)
    }
    b.ResetTimer()
    // ...
}

Comparing Results with benchstat

Raw benchmark numbers are noisy. Use benchstat to compare results with statistical rigor.

Installation

terminal
go install golang.org/x/perf/cmd/benchstat@latest

Workflow

Run benchmarks multiple times (at least 10) to collect enough samples:
terminal
go test -run='^$' -bench=. -count=10 > old.txt
Make your changes, then run again:
terminal
go test -run='^$' -bench=. -count=10 > new.txt
Compare with benchstat:
terminal
$ benchstat old.txt new.txt
   old.txt              new.txt
   sec/op   sec/op      vs base
Fibonacci-8   30.96µ ± 0%    30.99µ ± 0%  ~ (p=0.841 n=10)
The columns report:
  • ± 0%: the 95% confidence interval. Lower means more stable results.
  • ~ (p=0.841): no statistically significant difference. The p-value is from a Mann-Whitney U-test; values below 0.05 indicate a real change.
  • n=10: the sample count. Use -count=10 or higher for reliable results.

Profiling with pprof

Go benchmarks integrate directly with the pprof profiler. Generate profiles while benchmarking:
-cpuprofile
file
Write CPU profile. Shows where time is spent.
-memprofile
file
Write memory allocation profile. Shows where allocations happen.
-blockprofile
file
Write goroutine blocking profile. Shows where goroutines wait.
-mutexprofile
file
Write mutex contention profile. Shows lock contention hotspots.
Generate and analyze a CPU profile:
terminal
go test -run='^$' -bench=BenchmarkFibonacci -cpuprofile=cpu.prof
go tool pprof -http=:8080 cpu.prof
This opens an interactive web UI with flamegraphs, call graphs, and source-level annotations.
Collect only one profile type at a time for accuracy — profiling itself has overhead that can distort other measurements.

Best Practices

Run on Idle Machines

Close background processes, avoid running on battery, and disable CPU throttling when collecting benchmark data. Noise from other processes can mask real performance changes.

Use -count and benchstat for Decisions

Never eyeball raw ns/op numbers to decide if a change helped. Run with -count=10 and use benchstat to test for statistical significance. With ~20 benchmarks at alpha=0.05, expect ~1 false positive.

Track Memory Alongside Latency

Always use -benchmem or b.ReportAllocs(). Even if latency stays flat, increased allocations put pressure on the garbage collector and cause latency spikes under production load.

Use Sub-Benchmarks for Input Variation

Table-driven sub-benchmarks let you test across input sizes, data shapes, and configurations in a single benchmark function. They also enable filtering from the command line.

Running Benchmarks Continuously with CodSpeed

So far, you’ve been running benchmarks locally. But local benchmarking has limitations:
  • Inconsistent hardware: Different developers get different results
  • Manual process: Easy to forget to run benchmarks before merging
  • No historical tracking: Hard to spot gradual performance degradation
  • No PR context: Can’t see performance impact during code review
This is where CodSpeed comes in. It runs your benchmarks automatically in CI and provides:
  • Automated performance regression detection in PRs
  • Consistent metrics with reliable measurements across all runs
  • Historical tracking to see performance over time with detailed charts
  • Flamegraph profiles to see exactly what changed in your code’s execution
For the full CodSpeed integration reference, see Writing Benchmarks in Go.

How to Set Up CodSpeed

Here’s how to integrate CodSpeed with your Go benchmarks:
1

Set Up GitHub Actions

Create a workflow file to run benchmarks on every push and pull request.
2

Check the Results

Once the workflow runs, your pull requests will receive a performance report comment:Pull Request ResultPull Request Result
3

Access Detailed Reports and Flamegraphs

After your benchmarks run in CI, head over to your CodSpeed dashboard to see detailed performance reports, historical trends, and flamegraph profiles for deeper analysis.
Profiling Report on CodSpeed
Profiling works out of the box, no extra configuration needed!Learn more about flamegraphs and how to use them to optimize your code.

Next Steps

Check out these resources to continue your Go benchmarking journey:

Get Started with CodSpeed

Sign up and start tracking your Go performance in CI

CodSpeed Go Benchmarking Docs

CodSpeed’s Go integration reference and compatibility notes

Benchmarking a Go Gin API

A hands-on guide to benchmarking a real HTTP API with Gin

Performance Profiling

Learn how to use flamegraphs to optimize your code