Skip to main content

Why JMH?

This guide uses JMH (Java Microbenchmark Harness), the standard benchmarking framework for the JVM. JMH is developed as part of the OpenJDK project by the same engineers who build the JVM itself, so it understands JVM internals like JIT compilation, dead code elimination, and constant folding that can silently invalidate naive benchmarks. It handles warmup, fork isolation, and statistical analysis out of the box so you can focus on writing the code you want to measure.
This guide covers Maven and Gradle. JMH also works with SBT.

Your First Benchmark

Let’s start with the simplest possible JMH benchmark: a single method that measures how fast a recursive Fibonacci function runs.

Project Setup

The recommended way to use JMH with Maven is through its archetype, which generates a project pre-configured with the annotation processor and uber-JAR packaging:
terminal
mvn archetype:generate \
  -DinteractiveMode=false \
  -DarchetypeGroupId=org.openjdk.jmh \
  -DarchetypeArtifactId=jmh-java-benchmark-archetype \
  -DgroupId=com.example \
  -DartifactId=my-benchmarks \
  -Dversion=1.0
This creates a my-benchmarks/ directory with the following structure:
my-benchmarks
pom.xml
src/main/java/com/example
MyBenchmark.java
The generated pom.xml includes jmh-core (the runtime library), jmh-generator-annprocess (the annotation processor that generates benchmark harness code at compile time), and maven-shade-plugin (packages everything into a single executable benchmarks.jar).
Do not add jmh-core to an existing project without the annotation processor. JMH needs to generate synthetic benchmark code at compile time. The archetype (Maven) and plugin (Gradle) handle this correctly.

Writing the Benchmark

The archetype generates a stub MyBenchmark.java with an empty @Benchmark method. Open src/main/java/com/example/MyBenchmark.java and replace its contents with:
src/main/java/com/example/MyBenchmark.java
package com.example;

import org.openjdk.jmh.annotations.Benchmark;

public class MyBenchmark {

    @Benchmark
    public long fibonacci() {
        return fibonacci(30);
    }

    static long fibonacci(int n) {
        if (n <= 1) return n;
        return fibonacci(n - 1) + fibonacci(n - 2);
    }
}
That’s it. @Benchmark is the only annotation you need. JMH generates the measurement harness around it. The method returns its result, which prevents the JVM from eliminating the computation as dead code (more on this in avoiding common pitfalls).

Building and Running

Build the uber-JAR and run the benchmark:
terminal
cd my-benchmarks
mvn clean verify
java -jar target/benchmarks.jar
This will take about 8 minutes. JMH defaults are thorough: 5 forked JVMs, each running 5 warmup + 5 measurement iterations of 10 seconds. For a faster first run, add flags to reduce the iteration count:
terminal
java -jar target/benchmarks.jar -f 1 -wi 3 -i 5 -w 1 -r 1
These flags and annotations are explained in Configuring Your Benchmark.
You should see output like this:
terminal
# JMH version: 1.37
# VM version: JDK 17.0.18, OpenJDK 64-Bit Server VM, 17.0.18+8-Debian-1deb12u1
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Benchmark: com.example.MyBenchmark.fibonacci

# Run progress: 0.00% complete, ETA 00:08:20
# Fork: 1 of 5
# Warmup Iteration   1: 320.348 ops/s
# Warmup Iteration   2: 321.605 ops/s
# Warmup Iteration   3: 323.393 ops/s
# Warmup Iteration   4: 323.038 ops/s
# Warmup Iteration   5: 321.964 ops/s
Iteration   1: 320.996 ops/s
Iteration   2: 320.143 ops/s
Iteration   3: 323.586 ops/s
Iteration   4: 322.946 ops/s
Iteration   5: 321.108 ops/s

# Run progress: 20.00% complete, ETA 00:06:40
# Fork: 2 of 5
...

Benchmark               Mode  Cnt    Score   Error  Units
MyBenchmark.fibonacci  thrpt   25  320.479 ± 1.013  ops/s
Without any configuration, JMH automatically warmed up the JIT compiler across 5 separate JVM processes, collected 25 measurement iterations (5 per fork), and computed a tight 99.9% confidence interval. The default mode is Throughput (thrpt), measured in operations per second.
Understanding the results:
  • Mode: The benchmark mode (thrpt = throughput, operations per second).
  • Cnt: Total measurement iterations across all forks (5 forks x 5 iterations = 25).
  • Score: The measured value (higher is better for thrpt).
  • Error: The 99.9% confidence interval margin. The true value lies within Score ± Error with 99.9% confidence.
  • Units: ops/s = operations per second.

Configuring Your Benchmark

The previous benchmark used all JMH defaults. In practice, you want to embed settings into your benchmark class using annotations. This makes benchmarks self-describing and reproducible regardless of how they are invoked. Update MyBenchmark.java:
src/main/java/com/example/MyBenchmark.java
package com.example;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
@Fork(1)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
public class MyBenchmark {

    private int n = 30;

    @Benchmark
    public long fibonacci() {
        return fibonacci(n);
    }

    static long fibonacci(int n) {
        if (n <= 1) return n;
        return fibonacci(n - 1) + fibonacci(n - 2);
    }
}
Rebuild and run. No flags needed, everything is in the annotations:
terminal
mvn clean verify
java -jar target/benchmarks.jar
terminal
# Benchmark mode: Average time, time/op
# Benchmark: com.example.MyBenchmark.fibonacci

# Fork: 1 of 1
# Warmup Iteration   1: 3.117 ms/op
# Warmup Iteration   2: 3.131 ms/op
# Warmup Iteration   3: 3.088 ms/op
Iteration   1: 3.091 ms/op
Iteration   2: 3.097 ms/op
Iteration   3: 3.100 ms/op
Iteration   4: 3.096 ms/op
Iteration   5: 3.097 ms/op

Benchmark              Mode  Cnt  Score   Error  Units
MyBenchmark.fibonacci  avgt    5  3.096 ± 0.012  ms/op
The output now shows avgt (average time) in ms/op. A single fork completed in seconds instead of minutes. Computing fibonacci(30) takes about 3.1 milliseconds. The following sections break down each annotation.

Benchmark Mode

@BenchmarkMode controls what JMH measures. It can be placed on a class (applies to all methods) or on individual methods.
Mode.Throughput
default
Measures operations per second. Use this to quantify system capacity and compare throughput across implementations.
Mode.AverageTime
mode
Measures average time per operation. The general-purpose choice for latency benchmarking when you care about typical performance.
Mode.SampleTime
mode
Samples individual operation times and reports percentiles (p50, p90, p99, p99.9). Use this to understand tail latency, not just the average. Particularly useful because it reports percentiles directly:
terminal
MyBenchmark.fibonacci          sample  177816     41.340 ±  0.936  ns/op
MyBenchmark.fibonacci:p0.50    sample             38.000           ns/op
MyBenchmark.fibonacci:p0.90    sample             44.000           ns/op
MyBenchmark.fibonacci:p0.99    sample             58.000           ns/op
MyBenchmark.fibonacci:p0.999   sample            279.183           ns/op
MyBenchmark.fibonacci:p0.9999  sample           3199.859           ns/op
This reveals that while the median latency is 38ns, the p99.99 is 3.2 microseconds, an 84x spike. Percentile data like this is invaluable for understanding real-world latency characteristics.
Mode.SingleShotTime
mode
Measures the time for a single invocation with no warmup. Use this to benchmark cold-start performance and one-shot initialization costs.
You can pass an array to run multiple modes in one benchmark run, e.g., @BenchmarkMode({Mode.Throughput, Mode.AverageTime}). Use Mode.All to run every mode at once, which is useful for exploratory benchmarking.

State and Scope

@State marks a class as a holder for benchmark data. Without it, you cannot use instance fields in benchmark methods. The Scope parameter controls how state is shared:
Scope.Thread
default
Creates one state instance per thread with no sharing between threads. The default choice for most benchmarks.
Scope.Benchmark
scope
Shares one state instance across all threads. Use this when measuring contention and thread-safety overhead.
Scope.Group
scope
Shares one state instance per thread group. Use this for asymmetric benchmarks (e.g., producer/consumer patterns).
The benchmark class itself can be the state (as in our example), or you can define separate state classes:
Separate state class
@State(Scope.Benchmark)
public static class SharedState {
    ConcurrentHashMap<String, String> map = new ConcurrentHashMap<>();
}

@Benchmark
public void concurrentPut(SharedState state) {
    state.map.put("key", "value");
}

Fork, Warmup, Measurement, and Output Unit

These annotations control the execution strategy and output formatting:
@Fork
annotation
Controls how many separate JVM processes to run. Forks run sequentially, not in parallel. Each fork starts a fresh JVM, isolating profile-guided optimizations and JIT compilation state. Use jvmArgs to control heap size, GC settings, and other JVM flags. Use jvmArgsPrepend or jvmArgsAppend to add flags without replacing defaults.
@Fork(value = 3, jvmArgs = {"-Xms2G", "-Xmx2G"})
@Fork(value = 1, jvmArgsPrepend = {"-XX:+UseG1GC"})
@Warmup
annotation
Controls how many iterations run before measurement begins, giving the JIT compiler time to optimize your code to steady state. Parameters: iterations, time, timeUnit.
@Warmup(iterations = 5, time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement
annotation
Controls how many iterations are recorded and included in the results. Accepts the same parameters as @Warmup: iterations, time, timeUnit.
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@OutputTimeUnit
annotation
Controls the time unit displayed in results. Accepts any java.util.concurrent.TimeUnit value (e.g., TimeUnit.NANOSECONDS, TimeUnit.MILLISECONDS).
@OutputTimeUnit(TimeUnit.MICROSECONDS)
Configuration examples
// Quick feedback during development
@Fork(1)
@Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)

// Reliable measurements for CI
@Fork(value = 3, jvmArgs = {"-Xms2G", "-Xmx2G"})
@Warmup(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 10, time = 5, timeUnit = TimeUnit.SECONDS)

Threads

@Threads controls how many threads run the benchmark concurrently. The default is 1. Combined with Scope.Benchmark, this is how you measure contention:
Multi-threaded benchmark
@Threads(4)
@State(Scope.Benchmark)
public class ConcurrencyBenchmark {

    private ConcurrentHashMap<Integer, Integer> map = new ConcurrentHashMap<>();

    @Benchmark
    public Integer concurrentPut() {
        return map.put(Thread.currentThread().hashCode(), 42);
    }
}
Use @Threads(Threads.MAX) to use all available processors.

Benchmarking with Parameters

The previous examples all used a single input value (30). But what if you want to see how performance changes with different input sizes? This is where @Param comes in.

Single Parameter

Add a new benchmark class to test multiple input sizes:
src/main/java/com/example/FibonacciParameterized.java
package com.example;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Thread)
@Fork(1)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
public class FibonacciParameterized {

    @Param({"5", "10", "15", "20", "30"})
    private int n;

    static long fibonacci(int n) {
        if (n <= 1) return n;
        return fibonacci(n - 1) + fibonacci(n - 2);
    }

    @Benchmark
    public long fibRecursive() {
        return fibonacci(n);
    }
}
@Param tells JMH to run the benchmark once for each value. Rebuild and run:
terminal
mvn clean verify
java -jar target/benchmarks.jar FibonacciParameterized
terminal
Benchmark                            (n)  Mode  Cnt     Score    Error  Units
FibonacciParameterized.fibRecursive    5  avgt    5     0.013 ±  0.001  us/op
FibonacciParameterized.fibRecursive   10  avgt    5     0.202 ±  0.001  us/op
FibonacciParameterized.fibRecursive   15  avgt    5     2.277 ±  0.030  us/op
FibonacciParameterized.fibRecursive   20  avgt    5    25.213 ±  0.295  us/op
FibonacciParameterized.fibRecursive   30  avgt    5  3122.539 ± 55.028  us/op
The results clearly show the exponential O(2^n) growth of recursive Fibonacci: going from n=5 (13 nanoseconds) to n=30 (3.1 milliseconds), a factor of 240,000x.
You can override @Param values from the command line without recompiling:
java -jar target/benchmarks.jar -p n=25,35

Multiple Parameters

Each @Param annotation applies to a single field, but you can use multiple @Param fields to benchmark across several dimensions. JMH runs all combinations automatically:
Multiple @Param fields
@Param({"1000", "10000"})
private int size;

@Param({"ArrayList", "LinkedList"})
private String listType;
This produces four benchmark runs: 1000/ArrayList, 1000/LinkedList, 10000/ArrayList, 10000/LinkedList.

Comparing Algorithms

Parameters are powerful for comparing different implementations side-by-side. Let’s benchmark recursive vs. iterative Fibonacci:
src/main/java/com/example/AlgorithmComparison.java
package com.example;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Thread)
@Fork(1)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
public class AlgorithmComparison {

    @Param({"10", "20", "30"})
    private int n;

    static long fibRecursive(int n) {
        if (n <= 1) return n;
        return fibRecursive(n - 1) + fibRecursive(n - 2);
    }

    static long fibIterative(int n) {
        if (n <= 1) return n;
        long a = 0, b = 1;
        for (int i = 2; i <= n; i++) {
            long temp = a + b;
            a = b;
            b = temp;
        }
        return b;
    }

    @Benchmark
    public long recursive() {
        return fibRecursive(n);
    }

    @Benchmark
    public long iterative() {
        return fibIterative(n);
    }
}
terminal
Benchmark                      (n)  Mode  Cnt     Score    Error  Units
AlgorithmComparison.iterative   10  avgt    5     0.003 ±  0.001  us/op
AlgorithmComparison.iterative   20  avgt    5     0.004 ±  0.001  us/op
AlgorithmComparison.iterative   30  avgt    5     0.006 ±  0.001  us/op
AlgorithmComparison.recursive   10  avgt    5     0.203 ±  0.001  us/op
AlgorithmComparison.recursive   20  avgt    5    25.596 ±  0.795  us/op
AlgorithmComparison.recursive   30  avgt    5  3122.110 ± 33.573  us/op
The iterative version computes fibonacci(30) in 6 nanoseconds while the recursive version takes 3.1 milliseconds: over 500,000x faster. This is the power of parameterized benchmarks: they make algorithmic trade-offs visible at a glance.

Benchmarking Only What Matters

Sometimes you have expensive setup that should not be included in your benchmark measurements. For example, generating test data or loading files. JMH provides @Setup and @TearDown annotations with different Level options to control when fixture methods run.

Setup and Teardown

Let’s benchmark an outlier detection algorithm where the dataset generation is expensive but should not be measured:
src/main/java/com/example/OutlierDetection.java
package com.example;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.Random;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Thread)
@Fork(1)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
public class OutlierDetection {

    @Param({"10000", "100000", "1000000"})
    private int size;

    private double[] data;

    @Setup(Level.Trial)
    public void setUp() {
        // NOT MEASURED: expensive data generation runs once before all iterations
        Random random = new Random(42);
        data = new double[size];
        for (int i = 0; i < size; i++) {
            if (random.nextDouble() < 0.95) {
                data[i] = 100.0 + random.nextGaussian() * 15.0;
            } else {
                data[i] = 200.0 + random.nextDouble() * 100.0;
            }
        }
    }

    public static List<Integer> detectOutliers(double[] data, double threshold) {
        double sum = 0;
        for (double v : data) sum += v;
        double mean = sum / data.length;

        double variance = 0;
        for (double v : data) variance += (v - mean) * (v - mean);
        variance /= data.length;
        double stdDev = Math.sqrt(variance);

        List<Integer> outliers = new ArrayList<>();
        for (int i = 0; i < data.length; i++) {
            double zScore = stdDev > 0
                ? Math.abs((data[i] - mean) / stdDev) : 0;
            if (zScore > threshold) {
                outliers.add(i);
            }
        }
        return outliers;
    }

    @Benchmark
    public List<Integer> findOutliers() {
        // MEASURED: only the outlier detection algorithm
        return detectOutliers(data, 2.0);
    }
}
The @Setup(Level.Trial) method runs once before all measurement iterations. Only the findOutliers() method is timed:
terminal
Benchmark                       (size)  Mode  Cnt     Score    Error  Units
OutlierDetection.findOutliers    10000  avgt    5    22.659 ±  0.322  us/op
OutlierDetection.findOutliers   100000  avgt    5   282.683 ±  2.461  us/op
OutlierDetection.findOutliers  1000000  avgt    5  3079.753 ± 49.886  us/op

Fixture Levels

JMH offers three levels for @Setup and @TearDown:
Level.Trial
level
Runs once per benchmark fork. Use this for loading files and building large datasets that are reused across all iterations.
Level.Iteration
level
Runs before and after each measurement iteration. Use this to reset mutable state between iterations.
Level.Invocation
level
Runs before and after each individual method call. Use sparingly - this adds overhead on every invocation.
Level.Invocation adds timing overhead on every call. Only use it when the benchmark method is slow enough (milliseconds or more) that the fixture cost is negligible in comparison.
Here is an example using Level.Iteration to provide fresh unsorted data for each iteration of a sorting benchmark:
src/main/java/com/example/SortBenchmark.java
package com.example;

import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Thread)
@Fork(1)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
public class SortBenchmark {

    @Param({"1000", "10000", "100000"})
    private int size;

    private List<Integer> data;

    @Setup(Level.Iteration)
    public void setUp() {
        // Regenerate unsorted data before each iteration
        Random random = new Random(42);
        data = new ArrayList<>(size);
        for (int i = 0; i < size; i++) {
            data.add(random.nextInt(size));
        }
    }

    @Benchmark
    public List<Integer> sortList() {
        List<Integer> copy = new ArrayList<>(data);
        Collections.sort(copy);
        return copy;
    }
}

Running Benchmarks from the Command Line

The benchmarks.jar supports a rich set of command-line options. Here are the most useful ones:

Filtering Benchmarks

Run only benchmarks matching a regex:
terminal
java -jar target/benchmarks.jar "FibonacciParameterized"
Exclude benchmarks matching a pattern:
terminal
java -jar target/benchmarks.jar -e ".*Slow.*"

Overriding Parameters

Override @Param, @Fork, @Warmup, and @Measurement from the command line:
terminal
java -jar target/benchmarks.jar -p n=25,35 -f 3 -wi 5 -i 10
-f
int
Number of forks.
-t
int
Number of threads.
-wi
int
Warmup iterations.
-i
int
Measurement iterations.
-w
time
Warmup iteration time (e.g., 2s).
-r
time
Measurement iteration time.
-p
param=v1,v2
Override @Param values.
-bm
mode
Override benchmark mode (thrpt, avgt, sample, ss).
-tu
unit
Override time unit (ns, us, ms, s).

Exporting Results

JMH can export results in various formats for further analysis or visualization:
-rf
format
Result format. One of text, csv, scsv, json, latex.
-rff
path
Result file path. Where to write the output (e.g., results.json).
For example, to export JSON results:
terminal
java -jar target/benchmarks.jar -rf json -rff results.json

Using Profilers

JMH ships with built-in profilers. List them with:
terminal
java -jar target/benchmarks.jar -lprof
The most useful profilers:
-prof stack
string
Samples hot methods and thread states to show where time is being spent.
-prof gc
string
Reports allocation rate, GC pressure, and bytes allocated per operation.
-prof comp
string
Reports JIT compilation activity during the measurement window.
-prof perfnorm
string
Reports per-operation hardware counters: cache misses, branch mispredictions, and CPI. Linux only.
-prof async
string
Generates CPU flamegraphs using async-profiler.
For example, to measure allocation pressure:
terminal
java -jar target/benchmarks.jar OutlierDetection -prof gc
This adds GC metrics to the output, showing bytes allocated per operation (gc.alloc.rate.norm) and GC event counts, essential for understanding allocation-heavy code.

Avoiding Common Pitfalls

The JVM is a sophisticated optimizing runtime. Without care, it can silently eliminate or transform the code you are trying to measure, producing misleading results. JMH is designed to help, but you still need to follow certain patterns.

Dead Code Elimination

If a computation’s result is never used, the JIT compiler may eliminate it entirely:
Dead code elimination
// BAD: result is discarded, JVM may eliminate the entire computation
@Benchmark
public void measureWrong() {
    Math.log(x);
}

// GOOD: returning the result prevents dead code elimination
@Benchmark
public double measureRight() {
    return Math.log(x);
}
JMH automatically consumes the return value of @Benchmark methods through an internal Blackhole, preventing elimination. Always return your computed result.

Blackholes for Multiple Results

When you produce multiple results, you can only return one. Use Blackhole.consume() for the rest:
Blackhole usage
@Benchmark
public void computeMultiple(Blackhole bh) {
    bh.consume(Math.log(x));
    bh.consume(Math.sqrt(x));
}
Import Blackhole from org.openjdk.jmh.infra.Blackhole. JMH injects it automatically as a method parameter.

Constant Folding

If the JVM can determine a computation’s inputs at compile time, it folds the entire computation into a constant:
Constant folding
// BAD: the JVM knows wrongX is always Math.PI, result is precomputed
private final double wrongX = Math.PI;

@Benchmark
public double measureWrong() {
    return Math.log(wrongX);
}

// GOOD: non-final field prevents constant folding
private double x = Math.PI;

@Benchmark
public double measureRight() {
    return Math.log(x);
}
Your IDE may suggest making x final. Do not. Non-final @State fields are essential for preventing constant folding in benchmarks.

Do Not Loop Manually

Never write manual loops inside benchmark methods. The JVM aggressively optimizes loops. It unrolls, pipelines, and hoists invariant computations out of them, producing unrealistically low per-operation numbers:
Manual loops
// BAD: JVM optimizes the loop, results are misleading
@Benchmark
public int measureWrong() {
    int sum = 0;
    for (int i = 0; i < 1000; i++) {
        sum += compute(i);
    }
    return sum;
}

// GOOD: let JMH control the iteration
@Benchmark
public int measureRight() {
    return compute(x);
}
JMH handles iteration internally with proper safeguards. Trust the framework.

Best Practices

Use Multiple Forks

The JVM is non-deterministic. Profile-guided optimizations, garbage collection, and thread scheduling vary between runs. A single fork can give misleading results. Use multiple forks (see @Fork) to capture this variance:
Fork configuration
// For development, 1 fork is fine for fast feedback
@Fork(1)

// For reliable measurements, use 3-5 forks
@Fork(5)
Each fork starts a fresh JVM, isolating profile-guided optimizations and giving JMH enough data points to compute meaningful confidence intervals.

Keep Benchmarks Deterministic

Use fixed seeds in your @Setup methods for random number generators:
Deterministic setup
// BAD: different data every run, results are not reproducible
@Setup(Level.Trial)
public void setUp() {
    Random rng = new Random(); // non-deterministic seed
    // ...
}

// GOOD: fixed seed, results are reproducible
@Setup(Level.Trial)
public void setUp() {
    Random rng = new Random(42); // deterministic seed
    // ...
}

Verify Correctness Alongside Performance

Include assertions in your setup or dedicated test methods to ensure you are benchmarking correct code, not broken code that happens to be fast:
Correctness check
@Setup(Level.Trial)
public void setUp() {
    // Verify the algorithm is correct before measuring it
    if (fibonacci(10) != 55) {
        throw new IllegalStateException("fibonacci(10) should be 55");
    }
}

Use Realistic Data

Sorted or regular data can exploit hardware optimizations like branch prediction and cache prefetching, giving misleadingly good results. Use representative data that matches your production workload.

Benchmark Your Own Code

In real projects, organize your benchmarks alongside your source code:
my-project
pom.xml
src
main/java/com/example
MyAlgorithm.java
benchmarks
pom.xml
src/main/java/com/example
MyAlgorithmBenchmark.java
The benchmark submodule depends on your library and uses the JMH archetype setup. This keeps benchmark infrastructure separate from production code.

Running Benchmarks Continuously with CodSpeed

So far, you’ve been running benchmarks locally. But local benchmarking has limitations:
  • Inconsistent hardware: Different developers get different results
  • Manual process: Easy to forget to run benchmarks before merging
  • No historical tracking: Hard to spot gradual performance degradation
  • No PR context: Can’t see performance impact during code review
This is where CodSpeed comes in. It runs your benchmarks automatically in CI and provides:
  • Automated performance regression detection in PRs
  • Consistent metrics with reliable measurements across all runs
  • Historical tracking to see performance over time with detailed charts
  • Flamegraph profiles to see exactly what changed in your code’s execution

How to Set Up CodSpeed

Here’s how to integrate CodSpeed with your JMH benchmarks:
CodSpeed integrates with JMH through a custom fork. Before configuring CI, follow the Java integration reference to add the fork as a Maven or Gradle dependency.
1

Set Up GitHub Actions

Create a workflow file to run benchmarks on every push and pull request.
2

Check the Results

Once the workflow runs, your pull requests will receive a performance report comment:Pull Request ResultPull Request Result
3

Access Detailed Reports and Flamegraphs

After your benchmarks run in CI, head over to your CodSpeed dashboard to see detailed performance reports, historical trends, and flamegraph profiles for deeper analysis.
Profiling Report on CodSpeed
Profiling works out of the box, no extra configuration needed!Learn more about flamegraphs and how to use them to optimize your code.

Next Steps

Check out these resources to continue your Java benchmarking journey:

Get Started with CodSpeed

Sign up and start tracking your Java performance in CI

Java Integration Reference

Set up the CodSpeed JMH fork in your Maven or Gradle project

Performance Profiling

Learn how to use flamegraphs to optimize your code

JMH Source Code

Dive into the JMH source and annotation reference