How to Benchmark Go with the testing Package?

Why the `testing` Package?

Go has first-class benchmarking built into its standard library — no external framework needed. The testing package provides testing.B, which handles iteration control, timing, memory tracking, sub-benchmarks, and parallel execution out of the box. Every Go developer already has the tools installed. Benchmarks live in _test.go files alongside your code, run with go test, and integrate with the Go ecosystem’s profiling tools (pprof, benchstat).

Your First Benchmark

Let’s start with the simplest possible Go benchmark: measuring a recursive Fibonacci function.

Setting Up

Create a module and two files — the function and its benchmark:

terminal

mkdir my-benchmarks && cd my-benchmarks
go mod init example.com/bench

fib.go

package bench

func Fibonacci(n int) int {
    if n <= 1 {
        return n
    }
    return Fibonacci(n-1) + Fibonacci(n-2)
}

fib_test.go

package bench

import "testing"

func BenchmarkFibonacci(b *testing.B) {
    for b.Loop() {
        Fibonacci(20)
    }
}

A few things to note:

Benchmark functions must start with Benchmark and accept *testing.B.
b.Loop() (Go 1.24+) controls iteration. Iteration count and timing are handled automatically.
The function lives in a _test.go file, just like unit tests.

b.Loop() was introduced in Go 1.24. It replaces the older for i := 0; i < b.N; i++ pattern and is more precise — it automatically resets the timer, and the compiler is prevented from optimizing away the loop body. If you are on an older Go version, use the b.N pattern instead:

Legacy pattern (before Go 1.24)

func BenchmarkFibonacci(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Fibonacci(20)
    }
}

Running the benchmarks

terminal

$ go test -bench=.
goos: linux
goarch: amd64
pkg: example.com/bench
cpu: Intel(R) Xeon(R) Platinum 8488C
BenchmarkFibonacci-8   	   38594	     31076 ns/op
PASS
ok  	example.com/bench	1.207s

The highlighted line breaks down as follows:

BenchmarkFibonacci-8         38594        31076 ns/op
        ^                     ^            ^
        |                     |            |
        |                     |     time per iteration
        |              number of iterations
    benchmark name

The framework automatically adjusts the iteration count to run for at least 1 second by default. The -8 suffix on BenchmarkFibonacci-8 is the GOMAXPROCS value, defaulting to the number of available CPUs.

Configuring Your Benchmarks

Benchmark Duration

Control how long each benchmark runs with -benchtime:

terminal

go test -bench=. -benchtime=5s

You can also specify an exact iteration count:

terminal

go test -bench=. -benchtime=1000x

Memory Allocation Tracking

Add -benchmem to report allocation stats, or call b.ReportAllocs() inside the benchmark:

terminal

go test -bench=. -benchmem

terminal

BenchmarkFibonacci-8    38594    31076 ns/op    0 B/op    0 allocs/op

The two extra columns show bytes allocated per operation and number of allocations per operation. These are essential for catching allocation regressions — even if latency stays flat, increased allocations put pressure on the garbage collector.

Filtering and Skipping Tests

Run only benchmarks (skip unit tests) with a regex:

terminal

go test -run='^$' -bench=.

Filter to specific benchmarks:

terminal

go test -run='^$' -bench=BenchmarkFibonacci

Run benchmarks in a specific package, or recursively across all packages:

terminal

go test -bench=. ./pkg/foo
go test -bench=. ./...

Key CLI Flags

-bench

regexp

required

Run benchmarks matching the regular expression. Use -bench=. for all.

-benchtime

duration

default:"1s"

Minimum time per benchmark. Accepts a duration (5s, 100ms) or an exact iteration count (1000x).

-benchmem

bool

default:"false"

Report memory allocation statistics (B/op, allocs/op).

-count

int

default:"1"

Run each benchmark n times. Use -count=10 or higher for statistical analysis with benchstat.

-cpu

list

Comma-separated GOMAXPROCS values to test with (e.g., -cpu=1,2,4,8).

-run

regexp

Filter tests. Use -run='^$' to skip unit tests when benchmarking.

-timeout

duration

default:"10m"

Maximum total time for all tests and benchmarks.

Sub-benchmarks and Table-Driven Patterns

Sub-benchmarks with `b.Run`

Use b.Run() to create sub-benchmarks — the standard way to test different inputs or configurations:

fib_test.go

func BenchmarkFibonacciSizes(b *testing.B) {
    sizes := []int{5, 10, 15, 20, 30}
    for _, n := range sizes {
        b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
            for b.Loop() {
                Fibonacci(n)
            }
        })
    }
}

terminal

BenchmarkFibonacciSizes/n=5-8  	58634750	     21.27 ns/op
BenchmarkFibonacciSizes/n=10-8 	 4858176	       248.2 ns/op
BenchmarkFibonacciSizes/n=15-8 	  431430	      2781 ns/op
BenchmarkFibonacciSizes/n=20-8 	   38892	     30986 ns/op
BenchmarkFibonacciSizes/n=30-8 	     312	   3823289 ns/op

The exponential growth of recursive Fibonacci is clearly visible: n=5 takes 21ns, n=30 takes 3.8ms — a factor of 180,000x. You can filter sub-benchmarks from the command line:

terminal

go test -bench=BenchmarkFibonacciSizes/n=20

Comparing Algorithms

Sub-benchmarks make algorithm comparison straightforward:

fib_test.go

func FibonacciIterative(n int) int {
    if n <= 1 {
        return n
    }
    a, b := 0, 1
    for i := 2; i <= n; i++ {
        a, b = b, a+b
    }
    return b
}

func BenchmarkAlgorithms(b *testing.B) {
    for _, n := range []int{10, 20, 30} {
        b.Run(fmt.Sprintf("recursive/n=%d", n), func(b *testing.B) {
            for b.Loop() {
                Fibonacci(n)
            }
        })
        b.Run(fmt.Sprintf("iterative/n=%d", n), func(b *testing.B) {
            for b.Loop() {
                FibonacciIterative(n)
            }
        })
    }
}

terminal

BenchmarkAlgorithms/recursive/n=10-8   	 4782566	     251.9 ns/op
BenchmarkAlgorithms/iterative/n=10-8   	226538564	       5.375 ns/op
BenchmarkAlgorithms/recursive/n=20-8   	   38359	     31287 ns/op
BenchmarkAlgorithms/iterative/n=20-8   	159598522	       7.423 ns/op
BenchmarkAlgorithms/recursive/n=30-8   	     313	   3814719 ns/op
BenchmarkAlgorithms/iterative/n=30-8   	100000000	      10.10 ns/op

At n=30, the iterative version is 377,000x faster than the recursive one (10ns vs 3.8ms). The hierarchical sub-benchmark naming makes it easy to compare across both dimensions.

Benchmarking Only What Matters

Excluding Setup with `b.ResetTimer`

When your benchmark has expensive one-time setup, use b.ResetTimer() to exclude it from measurements:

Excluding setup

func BenchmarkProcess(b *testing.B) {
    data := expensiveSetup() // NOT measured
    b.ResetTimer()
    for i := 0; i < b.N; i++ {
        process(data) // MEASURED
    }
}

With b.Loop() (Go 1.24+), the timer is automatically reset on the first iteration, so b.ResetTimer() is no longer required unless the setup happens inside the loop body.

Per-Iteration Setup with Timer Control

When each iteration needs fresh data (e.g., sorting an unsorted slice), use b.StopTimer() and b.StartTimer():

sort_test.go

func BenchmarkSort(b *testing.B) {
    for _, size := range []int{100, 1000, 10000} {
        b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
            original := make([]int, size)
            for i := range original {
                original[i] = size - i // reverse-sorted = worst case
            }
            b.ResetTimer()
            for i := 0; i < b.N; i++ {
                b.StopTimer()
                data := make([]int, len(original))
                copy(data, original)
                b.StartTimer()
                sort.Ints(data)
            }
        })
    }
}

terminal

BenchmarkSort/size=100-8      	 5558869	     214.7 ns/op
BenchmarkSort/size=1000-8     	 1000000	      1096 ns/op
BenchmarkSort/size=10000-8    	  124218	      9888 ns/op

b.StopTimer() and b.StartTimer() have overhead. If the per-iteration setup is extremely cheap relative to what you are measuring, the timer overhead may distort results. Use this pattern only when the setup cost is significant.

Custom Metrics

Report domain-specific metrics with b.ReportMetric():

Custom metrics

func BenchmarkCustomMetrics(b *testing.B) {
    var compares int64
    for b.Loop() {
        s := []int{5, 4, 3, 2, 1}
        slices.SortFunc(s, func(a, b int) int {
            compares++
            return cmp.Compare(a, b)
        })
    }
    b.ReportMetric(float64(compares)/float64(b.N), "compares/op")
}

Use b.SetBytes(n) to report throughput in MB/s for I/O-bound benchmarks:

Throughput reporting

func BenchmarkRead(b *testing.B) {
    b.SetBytes(1024) // 1KB per operation
    for b.Loop() {
        readData(buf)
    }
}
// Output includes: 3125.00 MB/s

Parallel Benchmarks

Use b.RunParallel() to benchmark code under concurrent load. This creates GOMAXPROCS goroutines and distributes iterations among them:

fib_test.go

func BenchmarkFibonacciParallel(b *testing.B) {
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            Fibonacci(20)
        }
    })
}

terminal

BenchmarkFibonacciParallel-8   	  211784	     5754 ns/op

Compare this with the sequential result (31076 ns/op) — the parallel version runs ~5x faster on 8 cores, showing that this CPU-bound workload scales well across threads.

Do not call b.StopTimer(), b.StartTimer(), or b.ResetTimer() inside b.RunParallel. They have global effect and will corrupt measurements. Each goroutine must maintain its own local state.

Use b.SetParallelism(n) to increase concurrency beyond GOMAXPROCS for I/O-bound workloads:

High concurrency

func BenchmarkHTTPConcurrent(b *testing.B) {
    b.SetParallelism(4) // 4 * GOMAXPROCS goroutines
    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            makeHTTPRequest()
        }
    })
}

Avoiding Common Pitfalls

Compiler Dead Code Elimination

If a computation’s result is unused, the Go compiler may eliminate it entirely. This is the single most common source of misleading benchmark results.

Dead code elimination

// BAD: result is unused — compiler may eliminate the call entirely
func BenchmarkBroken(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Fibonacci(20) // may report ~0 ns/op
    }
}

// GOOD: b.Loop() prevents the compiler from optimizing away the body
func BenchmarkCorrect(b *testing.B) {
    for b.Loop() {
        Fibonacci(20)
    }
}

If you must use the b.N pattern (Go < 1.24), pass the result to runtime.KeepAlive so the compiler treats it as observed:

runtime.KeepAlive pattern

func BenchmarkCorrect(b *testing.B) {
    var r int
    for i := 0; i < b.N; i++ {
        r = Fibonacci(20)
    }
    runtime.KeepAlive(r)
}

Do Not Use `b.N` as Input

Using b.N as a function parameter means the workload grows with the iteration count. The benchmark never converges and reports meaningless numbers:

b.N misuse

// BAD: workload grows with b.N — benchmark never converges
func BenchmarkBad(b *testing.B) {
    for i := 0; i < b.N; i++ {
        Fibonacci(i) // i grows, each iteration is slower
    }
}

// GOOD: fixed input
func BenchmarkGood(b *testing.B) {
    for b.Loop() {
        Fibonacci(20)
    }
}

Keep Benchmarks Deterministic

Use fixed seeds for random data:

Deterministic setup

func BenchmarkSort(b *testing.B) {
    rng := rand.New(rand.NewSource(42)) // fixed seed
    data := make([]int, 1000)
    for i := range data {
        data[i] = rng.Intn(1000)
    }
    b.ResetTimer()
    // ...
}

Comparing Results with `benchstat`

Raw benchmark numbers are noisy. Use benchstat to compare results with statistical rigor.

Installation

terminal

go install golang.org/x/perf/cmd/benchstat@latest

Workflow

Run benchmarks multiple times (at least 10) to collect enough samples:

terminal

go test -run='^$' -bench=. -count=10 > old.txt

Make your changes, then run again:

terminal

go test -run='^$' -bench=. -count=10 > new.txt

Compare with benchstat:

terminal

$ benchstat old.txt new.txt
            │   old.txt    │              new.txt               │
            │   sec/op     │   sec/op      vs base              │
Fibonacci-8   30.96µ ± 0%    30.99µ ± 0%  ~ (p=0.841 n=10)

The columns report:

± 0%: the 95% confidence interval. Lower means more stable results.
~ (p=0.841): no statistically significant difference. The p-value is from a Mann-Whitney U-test; values below 0.05 indicate a real change.
n=10: the sample count. Use -count=10 or higher for reliable results.

Profiling with `pprof`

Go benchmarks integrate directly with the pprof profiler. Generate profiles while benchmarking:

-cpuprofile

file

Write CPU profile. Shows where time is spent.

-memprofile

file

Write memory allocation profile. Shows where allocations happen.

-blockprofile

file

Write goroutine blocking profile. Shows where goroutines wait.

-mutexprofile

file

Write mutex contention profile. Shows lock contention hotspots.

Generate and analyze a CPU profile:

terminal

go test -run='^$' -bench=BenchmarkFibonacci -cpuprofile=cpu.prof
go tool pprof -http=:8080 cpu.prof

This opens an interactive web UI with flamegraphs, call graphs, and source-level annotations.

Collect only one profile type at a time for accuracy — profiling itself has overhead that can distort other measurements.

Best Practices

Run on Idle Machines

Close background processes, avoid running on battery, and disable CPU throttling when collecting benchmark data. Noise from other processes can mask real performance changes.

Use `-count` and `benchstat` for Decisions

Never eyeball raw ns/op numbers to decide if a change helped. Run with -count=10 and use benchstat to test for statistical significance. With ~20 benchmarks at alpha=0.05, expect ~1 false positive.

Track Memory Alongside Latency

Always use -benchmem or b.ReportAllocs(). Even if latency stays flat, increased allocations put pressure on the garbage collector and cause latency spikes under production load.

Use Sub-Benchmarks for Input Variation

Table-driven sub-benchmarks let you test across input sizes, data shapes, and configurations in a single benchmark function. They also enable filtering from the command line.

Running Benchmarks Continuously with CodSpeed

So far, you’ve been running benchmarks locally. But local benchmarking has limitations:

Inconsistent hardware: Different developers get different results
Manual process: Easy to forget to run benchmarks before merging
No historical tracking: Hard to spot gradual performance degradation
No PR context: Can’t see performance impact during code review

This is where CodSpeed comes in. It runs your benchmarks automatically in CI and provides:

Automated performance regression detection in PRs
Consistent metrics with reliable measurements across all runs
Historical tracking to see performance over time with detailed charts
Flamegraph profiles to see exactly what changed in your code’s execution

For the full CodSpeed integration reference, see Writing Benchmarks in Go.

How to Set Up CodSpeed

Here’s how to integrate CodSpeed with your Go benchmarks:

Set Up GitHub Actions

Create a workflow file to run benchmarks on every push and pull request.

Check the Results

Once the workflow runs, your pull requests will receive a performance report comment:

Access Detailed Reports and Flamegraphs

After your benchmarks run in CI, head over to your CodSpeed dashboard to see detailed performance reports, historical trends, and flamegraph profiles for deeper analysis.

Profiling works out of the box, no extra configuration needed!Learn more about flamegraphs and how to use them to optimize your code.

Next Steps

Check out these resources to continue your Go benchmarking journey:

Get Started with CodSpeed

CodSpeed Go Benchmarking Docs

CodSpeed’s Go integration reference and compatibility notes

Benchmarking a Go Gin API

A hands-on guide to benchmarking a real HTTP API with Gin

Performance Profiling

Learn how to use flamegraphs to optimize your code

Benchmarking Guides

Specialized Guides

How to Benchmark Go with the testing Package?

Why the `testing` Package?

Your First Benchmark

Setting Up

Running the benchmarks

Configuring Your Benchmarks

Benchmark Duration

Memory Allocation Tracking

Filtering and Skipping Tests

Key CLI Flags

Sub-benchmarks and Table-Driven Patterns

Sub-benchmarks with `b.Run`

Comparing Algorithms

Benchmarking Only What Matters

Excluding Setup with `b.ResetTimer`

Per-Iteration Setup with Timer Control

Custom Metrics

Parallel Benchmarks

Avoiding Common Pitfalls

Compiler Dead Code Elimination

Do Not Use `b.N` as Input

Keep Benchmarks Deterministic

Comparing Results with `benchstat`

Installation

Workflow

Profiling with `pprof`

Best Practices

Run on Idle Machines

Use `-count` and `benchstat` for Decisions

Track Memory Alongside Latency

Use Sub-Benchmarks for Input Variation

Running Benchmarks Continuously with CodSpeed

How to Set Up CodSpeed

Next Steps

Get Started with CodSpeed

CodSpeed Go Benchmarking Docs

Benchmarking a Go Gin API

Performance Profiling

Benchmarking Guides

Specialized Guides

Documentation Index

​Why the testing Package?

​Your First Benchmark

​Setting Up

​Running the benchmarks

​Configuring Your Benchmarks

​Benchmark Duration

​Memory Allocation Tracking

​Filtering and Skipping Tests

​Key CLI Flags

​Sub-benchmarks and Table-Driven Patterns

​Sub-benchmarks with b.Run

​Comparing Algorithms

​Benchmarking Only What Matters

​Excluding Setup with b.ResetTimer

​Per-Iteration Setup with Timer Control

​Custom Metrics

​Parallel Benchmarks

​Avoiding Common Pitfalls

​Compiler Dead Code Elimination

​Do Not Use b.N as Input

​Keep Benchmarks Deterministic

​Comparing Results with benchstat

​Installation

​Workflow

​Profiling with pprof

​Best Practices

​Run on Idle Machines

​Use -count and benchstat for Decisions

​Track Memory Alongside Latency

​Use Sub-Benchmarks for Input Variation

​Running Benchmarks Continuously with CodSpeed

​How to Set Up CodSpeed

​Next Steps

Get Started with CodSpeed

CodSpeed Go Benchmarking Docs

Benchmarking a Go Gin API

Performance Profiling

Why the `testing` Package?

Your First Benchmark

Setting Up

Running the benchmarks

Configuring Your Benchmarks

Benchmark Duration

Memory Allocation Tracking

Filtering and Skipping Tests

Key CLI Flags

Sub-benchmarks and Table-Driven Patterns

Sub-benchmarks with `b.Run`

Comparing Algorithms

Benchmarking Only What Matters

Excluding Setup with `b.ResetTimer`

Per-Iteration Setup with Timer Control

Custom Metrics

Parallel Benchmarks

Avoiding Common Pitfalls

Compiler Dead Code Elimination

Do Not Use `b.N` as Input

Keep Benchmarks Deterministic

Comparing Results with `benchstat`

Installation

Workflow

Profiling with `pprof`

Best Practices

Run on Idle Machines

Use `-count` and `benchstat` for Decisions

Track Memory Alongside Latency

Use Sub-Benchmarks for Input Variation

Running Benchmarks Continuously with CodSpeed

How to Set Up CodSpeed

Next Steps