Build a Gin HTTP API, write Golang benchmarks, and run them with CodSpeed in consistent CI environments
This guide shows how to benchmark a Gin-based HTTP API using Go’s testing package and
CodSpeed. We’ll create a minimal API, design clean benchmarks measuring what matters, and run
them in CI with consistent results.
Let’s start with an API from the official Gin
tutorial. If you never have used
Gin before, following this tutorial is a great way to get started before we start
benchmarking.We’ll organize the project so benchmarks target a library package while you still have a runnable server for manual testing.
api.go
Copy
package apiimport ( "net/http" "github.com/gin-gonic/gin")// album represents data about a record album.type album struct { ID string `json:"id"` Title string `json:"title"` Artist string `json:"artist"` Price float64 `json:"price"`}// albums slice to seed record album data.var albums = []album{ {ID: "1", Title: "Blue Train", Artist: "John Coltrane", Price: 56.99}, {ID: "2", Title: "Jeru", Artist: "Gerry Mulligan", Price: 17.99}, {ID: "3", Title: "Sarah Vaughan and Clifford Brown", Artist: "Sarah Vaughan", Price: 39.99},}func main() { router := gin.Default() router.GET("/albums", getAlbums) router.GET("/albums/:id", getAlbumByID) router.POST("/albums", postAlbums) router.Run("localhost:8080")}// getAlbums responds with the list of all albums as JSON.func getAlbums(c *gin.Context) { c.IndentedJSON(http.StatusOK, albums)}// postAlbums adds an album from JSON received in the request body.func postAlbums(c *gin.Context) { var newAlbum album if err := c.BindJSON(&newAlbum); err != nil { return } albums = append(albums, newAlbum) c.IndentedJSON(http.StatusCreated, newAlbum)}// getAlbumByID returns the album matching the provided id.func getAlbumByID(c *gin.Context) { id := c.Param("id") for _, a := range albums { if a.ID == id { c.IndentedJSON(http.StatusOK, a) return } } c.IndentedJSON(http.StatusNotFound, gin.H{"message": "album not found"})}
The only difference here with the original code is that we’re now using the
api package name instead of main for compatibility reasons.
As a small recap, this small HTTP API handles music albums by storing them in memory and has three routes:
GET /albums: Returns all albums
GET /albums/:id: Returns a specific album by ID
POST /albums: Creates a new album
Let’s run it to make sure it works:
Copy
$ go mod init github.com/your/repo # initialize the module$ go get github.com/gin-gonic/gin@latest # get the latest version of Gin$ go mod tidy # tidy the module dependencies$ go run api.go # run the server[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode)[GIN-debug] GET /albums --> go-gin-benchmarks-example/api.getAlbums (3 handlers)[GIN-debug] GET /albums/:id --> go-gin-benchmarks-example/api.getAlbumByID (3 handlers)[GIN-debug] POST /albums --> go-gin-benchmarks-example/api.postAlbums (3 handlers)[GIN-debug] Listening and serving HTTP on localhost:8080
Now, let’s get started writing performance tests to actually measure the performance of each route of this API.First, we need to do a bit of refactoring to make it easier to write benchmarks.
In the initial code, the router is created and configured in the main function. This
is not ideal for any tests or benchmarks because it’s impossible to reuse the
router configuration.Let’s isolate the router creation and configuration in a separate function:
This benchmark creates a router, creates a request and a response recorder, and then
loops over the code to benchmark using b.Loop(), measuring the time it takes for each iteration.Let’s run it:
Copy
$ go test -bench=.[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production. - using env: export GIN_MODE=release - using code: gin.SetMode(gin.ReleaseMode)[GIN-debug] GET /albums --> go-gin-benchmarks-example.getAlbums (3 handlers)[GIN-debug] GET /albums/:id --> go-gin-benchmarks-example.getAlbumByID (3 handlers)[GIN-debug] POST /albums --> go-gin-benchmarks-example.postAlbums (3 handlers)[GIN] 2025/09/19 - 17:56:36 | 200 | 1.959µs | GET "/albums"[GIN] 2025/09/19 - 17:56:36 | 200 | 2.125µs | GET "/albums"[GIN] 2025/09/19 - 17:56:36 | 200 | 2µs | GET "/albums"[GIN] 2025/09/19 - 17:56:36 | 200 | 2.666µs | GET "/albums"... A LOT OF THOSE LINES ...[GIN] 2025/09/19 - 17:56:36 | 200 | 2.084µs | GET "/albums"[GIN] 2025/09/19 - 17:56:36 | 200 | 2µs | GET "/albums"[GIN] 2025/09/19 - 17:56:36 | 200 | 2.042µs | GET "/albums"[GIN] 2025/09/19 - 17:56:36 | 200 | 1.958µs | GET "/albums"goos: darwingoarch: arm64pkg: go-gin-benchmarks-examplecpu: Apple M1 ProBenchmarkGetAlbums-10 54296 21328 ns/opPASSok go-gin-benchmarks-example 1.460s
It works! The first benchmark is running, and the results are displayed.Let’s dive in the numbers:
First we can see in the [GIN] logs that our request takes roughly 2µs on average.
This is an interesting reference point but actually not the source of truth we’ll use
for our benchmark results.
The benchmark name BenchmarkGetAlbums-10 has the -10 suffix, which means it ran on 10 CPU cores.
It ran 54,296 times, taking an average of 21,328 ns per operation (which
translates to 21.328 µs per request in our case).
Overall, benchmarking this module took 1.460 seconds.
However, seeing the output of this first run, we can note a few things that are not ideal:
There is a significant overhead in our measurement: we end up measuring ~21µs per
request, but the reported timing of a single request by the router is ~2 µs. That
means 90% of what we measure in not what’s happening in the router.
As mentioned in the logs, Gin is running in debug mode here: since we want to measure
something as closely related to what happens in prod, we should run the router in
release mode to measure realistic performance.
The output of the router is very verbose, while it’s very convenient for
integration or unit tests, it’s harmful to have all those logs in the
benchmark since we’re also measuring STDOUT performance here.
To fix those issues, we can create a helper function to set up the router specifically for benchmarking:
Copy
func setupBenchmarkRouter() *gin.Engine { // Set Gin to release mode for benchmarks gin.SetMode(gin.ReleaseMode) // Discard all output during benchmarks to only preserve benchmark output gin.DefaultWriter = io.Discard return setupRouter()}
And here we go:
Copy
$ go test -bench=.goos: darwingoarch: arm64pkg: go-gin-benchmarks-examplecpu: Apple M1 ProBenchmarkGetAlbums-10 381921 3060 ns/opPASSok go-gin-benchmarks-example 1.495s
And now, we can first see that the output is way cleaner and simpler to understand what’s going on during the benchmarking process!Also, we can see that the overhead is way lower, and the reported timing (3.060µs) is much closer
to the actual time spent in the router, putting us in a much better position to make
decisions about performance improvements.Still, there’s one last thing we can do to improve the benchmark results.
Since the beginning, we reused the httptest.NewRecorder() inspired by the Gin testing
example to get a response writer. This is necessary because Gin’s ServeHTTP method
requires an http.ResponseWriter to handle the HTTP response, and
httptest.NewRecorder() provides a concrete implementation that captures the response
for inspection.This is convenient but introduces some extra costs not really worth measuring in our case:
Extra useless allocations
JSON buffering to capture the responses
JSON processing to handle the responses
Let’s replace it with a dummy writer that discards all data:
We’re using the -benchtime=5s flag to run the benchmarks for 5 seconds each,
making sure we get enough samples to get a good estimate of the performance.
And now, almost all branches of the API are covered by this set of benchmarks!
Local benchmarks are excellent for development iteration, but running benchmarks in CI provides consistency and automation that local benchmarking can’t match:
Consistent hardware: CI runners eliminate the “works on my machine” problem. Your laptop’s thermal throttling, background processes, and varying load create noise that masks real performance changes.
Automated detection: Catch performance regressions before they reach production. Every PR gets benchmarked automatically, making performance a first-class concern like tests.
Historical tracking: Build a performance timeline across commits. Spot trends, identify when regressions were introduced, and validate that optimizations actually worked.
The CodSpeed GitHub Action will automatically run the benchmarks with instrumentation and upload the results to CodSpeed. It mostly boils down to this:
.github/workflows/codspeed.yml
Copy
name: CodSpeed Benchmarkson: push: branches: - "main" # or "master" pull_request: # `workflow_dispatch` allows CodSpeed to trigger backtest # performance analysis in order to generate initial data. workflow_dispatch:jobs: benchmarks: name: Run Go benchmarks runs-on: codspeed-macro steps: - uses: actions/checkout@v4 - uses: actions/setup-go@v5 - name: Run the benchmarks uses: CodSpeedHQ/action@v4 with: mode: walltime run: go test -bench=. -benchtime=5s token: ${{ secrets.CODSPEED_TOKEN }} # optional for public repos
Two important things to note:
We’re using the codspeed-macro runners to run the benchmarks on an optimized and
isolated CI machine, removing noise from virtualization and shared resources. Check
out the Macro Runners page for more details.
We’re again using the -benchtime=5s flag to run the benchmarks for 5 seconds each,
making sure we get enough samples to get a good estimate of the performance. Feel
free to change it to your needs.
This example is for GitHub Actions, but you can use CodSpeed with any other CI
providers. Check out the CI integration docs for more
details on the CI integration.
And now each pull request will automatically run the benchmarks, and you’ll be able to see the results in the CodSpeed dashboard.For example, let’s use SQLite instead of an in-memory database to see the performance impact (check out the code here):
Github Comment on the with the benchmark results
This also emits a status check on the pull request:
Status check on the pull request which can be used to prevent merging regressions
Now, we can analyze the performance report to see the details of the performance regression:
Performance report with the Differential Flamegraph
Here, we clearly see the dramatic performance regression in getAlbums (in red) introduced by the new (in blue) database/sql usage.However, we can see that there is almost no difference in the postAlbum benchmark:
Diving deeper, we can see that actually the changes on the postAlbum function are important but only impact 0.1% of the total time:
The function is getting 31% slower but only impacts 0.1% of the total time
In the end, using SQLite would impact primarily the read operations without any impact on the write operations.Check out the pull
request and the
CodSpeed performance
report
for more details.