Latest Results
[claude] feat(bench): emit v3 JSONL records and dual-write to bench server (#7780)
## Summary
Prototype website:
http://ec2-18-219-54-101.us-east-2.compute.amazonaws.com:3000/
This is the first step we should make before we cut over to the new
benchmarks website on https://github.com/vortex-data/vortex/pull/7643
This PR allows the CI actions to additionally post data to a server (on
my EC2 instance for now). We want to check that this actually works
before we start using this for all of our CI.
Note that this does NOT change how the current benchmarks website works,
as this just does a few extra things on top of that.
Also for reviewers, even though this looks like 1k LoC I think the logic
here is not that hard to review, a lot of this is boilerplate you can
skim over.
Below is a bunch of AI-generated description: read at your own
discretion.
<details>
Brings the v3 emitter and CI dual-write plumbing from `ct/benchmarks-v3`
onto `develop` **without** the v3 server/website code. CI continues to
write v2 results to S3 unchanged; v3 ingest is a side channel that
no-ops until the deploy track sets `vars.V3_INGEST_URL`.
This is item 2 ("CI ingestion wiring") of the v3 production-readiness
checklist in
[`benchmarks-website/planning/README.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/README.md).
The v3 website itself ships in a separate PR off `ct/benchmarks-v3` once
dual-write is verified healthy in production.
### What's included
**Rust emitter (`vortex-bench`)**
- New `vortex-bench/src/v3.rs`: one record per `kind`
(`query_measurement`, `compression_time`, `compression_size`,
`random_access_time`, `vector_search_run`) plus a serde-tagged
`V3Record` enum, JSONL writer, and `insta` snapshot tests. Field shapes
match
[`02-contracts.md`](https://github.com/vortex-data/vortex/blob/ct/benchmarks-v3/benchmarks-website/planning/02-contracts.md).
- `Dataset::v3_dataset_dims()` (default `(name(), None)`) lets Public-BI
map to `(public-bi, <subset>)`.
- `compress` and `runner` capture per-iteration timings and provide
`SqlBenchmarkRunner::v3_records()`.
**Benchmark binaries**
- `compress-bench`, `datafusion-bench`, `duckdb-bench`, `lance-bench`,
`random-access-bench`, `vector-search-bench` all gain `--gh-json-v3
<path>`. Bare records, no envelope. The legacy `-d gh-json -o ...` flow
is untouched.
**`bench-orchestrator`**
- `vx-bench run --gh-json-v3 <path>` plumbs the flag through to the
underlying benchmark binary.
**`scripts/post-ingest.py`** (Python 3, stdlib only)
- Reads JSONL, fills the `commit` envelope from `git show`, wraps in
`{run_meta, commit, records}`, POSTs to `/api/ingest` with
`Authorization: Bearer ${INGEST_BEARER_TOKEN}`. Exits non-zero on
4xx/5xx. No retry/spool ā deferred.
**Workflows**
- `.github/workflows/bench.yml` and `sql-benchmarks.yml` add
`--gh-json-v3 results.v3.jsonl` to the bench runs and a follow-up
"Ingest results to v3 server" step.
- New `.github/workflows/v3-commit-metadata.yml` POSTs an empty envelope
on every push to `develop` so the v3 `commits` dim stays populated even
when no benchmark ran.
### What's NOT included (intentionally)
- Anything under `benchmarks-website/` ā the v2 React/Node app stays in
production unchanged.
- Workspace member additions for `benchmarks-website/server` and
`benchmarks-website/migrate` ā those crates don't exist on `develop`
yet.
- `.github/workflows/ci.yml` and `publish-bench-server.yml` changes ā
they reference `vortex-bench-server`, which is also v3-server-only.
## Risk
**Zero.** The v3 ingest step is gated on `vars.V3_INGEST_URL != ''` and
`continue-on-error: true`. If the V3 server is down, the variable is
unset, or the bearer secret is missing, the workflow no-ops and the v2
path keeps writing to S3 unchanged. The Rust emitter writes JSONL to a
local file only; no network egress from the binaries themselves.
## Verify
A CI run on this branch should show the new "Ingest results to v3
server" step running and POSTing successfully to the EC2 host at
`vars.V3_INGEST_URL`.
## Follow-up
The v3 website itself (server, migrator, web UI) ships in a separate PR
off `ct/benchmarks-v3` once dual-write is verified healthy in
production. Outbox-style retry on failed POSTs is also a follow-up ā not
built until we observe a failure in the wild.
## Test plan
- [x] `cargo build -p vortex-bench` ā clean.
- [x] `cargo nextest run -p vortex-bench` ā 49/49 pass, including 7 new
v3 snapshot tests.
- [x] `cargo build -p compress-bench -p datafusion-bench -p duckdb-bench
-p lance-bench -p random-access-bench -p vector-search-bench` ā clean.
- [x] All six benchmark binaries print `--gh-json-v3 <GH_JSON_V3>` in
`--help`.
- [x] `python3 scripts/post-ingest.py --help` ā clean.
- [x] `pytest bench-orchestrator/tests/test_executor.py` ā 5/5 pass,
including 2 new `gh_json_v3` tests.
- [x] `cargo +nightly fmt --all` ā no diff.
- [x] `cargo clippy --all-targets --all-features -p vortex-bench` ā
clean.
- [x] `cargo clippy --all-targets -p compress-bench -p datafusion-bench
-p lance-bench -p random-access-bench -p vector-search-bench` ā clean.
`duckdb-bench` skipped (transitively triggers a pre-existing
`cognitive_complexity` lint in `vortex-duckdb/src/convert/expr.rs:47`,
present on `develop` and unrelated to these changes).
- [x] `yamllint --strict -c .yamllint.yaml` on the three changed/new
workflow files ā clean.
- [x] `./scripts/public-api.sh` ā N/A. All touched Rust crates have
`publish = false`.
- [ ] Real round-trip against the EC2 host ā verifies once this branch
triggers a CI bench run with `V3_INGEST_URL` set.
---
_Generated by [Claude
Code](https://claude.ai/code/session_0154XbxhgQztmbrQfJ4ZSxVo)_
</details>
---------
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com> [claude] benchmarks-website v3: per-group descriptions + partial-coverage commits (#7784)
Two independent fixes for the v3 server, on the same branch.
## Task A ā per-group hover descriptions
Port v2's `BENCHMARK_DESCRIPTIONS` + `getBenchmarkDescription` strings
into the v3 server.
- New `description: Option<String>` field on `Group` and
`GroupChartsResponse`, populated from a small hand-maintained table in
`api/descriptions.rs`. Strings match v2 verbatim where v2 had one
(Random Access, Compression, Compression Size, Clickbench, StatPopGen,
PolarSignals).
- TPC-H / TPC-DS descriptions are derived from the parsed group name so
there's no need for one entry per `(storage, sf)` pair. TPC-H carries
the `~XGB of data` annotation (`SF=1` ā 1GB, `SF=10` ā 10GB, etc.);
TPC-DS does not, matching v2 verbatim.
- Vector-search groups have no canonical description in v2, so the icon
is omitted for those rather than fabricated.
Render path:
- Landing page: a small ā icon (`.group-info-icon`, cursor-help) next to
every group title, surfacing the description via a CSS-only
`[data-tooltip]::after` pseudo-element. Visible on hover **and** on
focus; the icon is `tabindex="0" role="note"` with `aria-label` set so
it's reachable via keyboard and to screen readers.
- `/group/{slug}` permalink page: same icon next to the chart-meta
header.
## Task B ā render commits with partial / missing series data
**Symptom:** charts had invisible gaps where commits should be.
**Diagnosis:**
1. `SeriesAccumulator::ensure_commit` only registered a commit *after* a
series produced a row, so commits with zero rows in the chart's fact
table were silently dropped from `commits[]`.
2. The client renderer set `spanGaps: true` on every dataset, so
surviving nulls were drawn over as continuous lines.
**Server fix.** Each `collect_*_chart` now seeds the accumulator with a
canonical `commits`-dim pre-pass scoped to *every commit in the
requested `CommitWindow` whose timestamp is at or after the earliest
commit that has a row in this chart's fact table.* That bounded form
keeps two things right:
- Commits with zero fact-table rows still appear in `commits[]`; their
per-series slot stays `null` and (with the client fix) renders as a
visible gap.
- Commits *older* than the chart's first fact-table row are excluded ā
we never render pre-history before the benchmark even existed.
The accumulator now exposes `seed_commits` + `commit_idx(sha)`; the
per-fact-table SQL is unchanged shape-wise but no longer carries
`c.timestamp/message/url` (the seeded set already has those).
**Client fix.** `spanGaps: false` on every dataset in `chart-init.js`,
so missing measurements show up as a real break in the line. The
external tooltip already cleanly skipped null rows via
`.filter(Boolean)` ā preserved. LTTB still operates on the union of
non-null x-positions and skips downsampling when the union ā¤
`MAX_VISIBLE_POINTS`, so a sparse series with a handful of non-null
values across hundreds of commits still renders all of them as connected
points.
Bumped `STATIC_ASSET_VERSION` from `bench-v3-ui-17` ā `bench-v3-ui-18`
so cached browsers see the new bytes.
## Tests
New `server/tests/web_ui.rs` covers both tasks:
**Task A:**
- `landing_page_renders_group_descriptions` ā verbatim v2 strings + the
SF-derived TPC blurb both appear on `/`.
- `group_page_renders_description` ā same icon on `/group/{slug}`.
- `vector_search_group_has_no_description_icon` ā no canonical
description ā no icon.
- `groups_api_carries_description_field` ā wire shape: `description`
field present where v2 had one, absent for vector-search groups.
**Task B:**
- `chart_includes_commits_with_partial_series_coverage` ā the headline
regression test: commits A (X+Y), B (only Y), C (X+Y); B appears in
`commits[]` with `null` for X.
- `chart_includes_commits_with_zero_rows_in_fact_table` ā a commit that
emitted only `compression_size` still appears on the random-access
chart's x-axis.
- `chart_excludes_commits_before_first_fact_row` ā commits older than
the bench's first row stay off the chart.
Plus unit tests in `api/descriptions.rs` for the static-vs-derived
dispatch and the SF parser edge cases.
## Conventions
- Branched from `ct/benchmarks-v3`, PR'd back to it.
- Every commit has a `Signed-off-by:` trailer.
- Don't auto-merge ā leaving for review.
- Local `cargo build` was hitting environment issues (target-dir lock
contention); skipped in favour of GitHub Actions for the canonical Rust
checks. The Rust changes are mechanical (new module, new field on
existing structs, refactor of `collect_*_chart` to a seed-then-fill
pattern) and well-covered by the new tests above.
## Test plan
- [ ] CI green (`cargo test -p vortex-bench-server`, fmt, clippy on
touched code) ā tracked by GitHub Actions.
- [ ] Open landing page, hover a group title's ā ā description appears
below the icon.
- [ ] Tab to a group title's ā ā description still appears (focus path).
- [ ] Visit `/group/<slug>` for Random Access ā description renders next
to chart count.
- [ ] Find a chart with known partial coverage; confirm
previously-missing commits now appear with visible gaps in the affected
series.
---
_Generated by [Claude
Code](https://claude.ai/code/session_01GCbWrKBT1ToBevooJRCJMx)_
---------
Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com> [claude] benchmarks-website v3: per-group descriptions + partial-coverage commits (#7784)
Two independent fixes for the v3 server, on the same branch.
## Task A ā per-group hover descriptions
Port v2's `BENCHMARK_DESCRIPTIONS` + `getBenchmarkDescription` strings
into the v3 server.
- New `description: Option<String>` field on `Group` and
`GroupChartsResponse`, populated from a small hand-maintained table in
`api/descriptions.rs`. Strings match v2 verbatim where v2 had one
(Random Access, Compression, Compression Size, Clickbench, StatPopGen,
PolarSignals).
- TPC-H / TPC-DS descriptions are derived from the parsed group name so
there's no need for one entry per `(storage, sf)` pair. TPC-H carries
the `~XGB of data` annotation (`SF=1` ā 1GB, `SF=10` ā 10GB, etc.);
TPC-DS does not, matching v2 verbatim.
- Vector-search groups have no canonical description in v2, so the icon
is omitted for those rather than fabricated.
Render path:
- Landing page: a small ā icon (`.group-info-icon`, cursor-help) next to
every group title, surfacing the description via a CSS-only
`[data-tooltip]::after` pseudo-element. Visible on hover **and** on
focus; the icon is `tabindex="0" role="note"` with `aria-label` set so
it's reachable via keyboard and to screen readers.
- `/group/{slug}` permalink page: same icon next to the chart-meta
header.
## Task B ā render commits with partial / missing series data
**Symptom:** charts had invisible gaps where commits should be.
**Diagnosis:**
1. `SeriesAccumulator::ensure_commit` only registered a commit *after* a
series produced a row, so commits with zero rows in the chart's fact
table were silently dropped from `commits[]`.
2. The client renderer set `spanGaps: true` on every dataset, so
surviving nulls were drawn over as continuous lines.
**Server fix.** Each `collect_*_chart` now seeds the accumulator with a
canonical `commits`-dim pre-pass scoped to *every commit in the
requested `CommitWindow` whose timestamp is at or after the earliest
commit that has a row in this chart's fact table.* That bounded form
keeps two things right:
- Commits with zero fact-table rows still appear in `commits[]`; their
per-series slot stays `null` and (with the client fix) renders as a
visible gap.
- Commits *older* than the chart's first fact-table row are excluded ā
we never render pre-history before the benchmark even existed.
The accumulator now exposes `seed_commits` + `commit_idx(sha)`; the
per-fact-table SQL is unchanged shape-wise but no longer carries
`c.timestamp/message/url` (the seeded set already has those).
**Client fix.** `spanGaps: false` on every dataset in `chart-init.js`,
so missing measurements show up as a real break in the line. The
external tooltip already cleanly skipped null rows via
`.filter(Boolean)` ā preserved. LTTB still operates on the union of
non-null x-positions and skips downsampling when the union ā¤
`MAX_VISIBLE_POINTS`, so a sparse series with a handful of non-null
values across hundreds of commits still renders all of them as connected
points.
Bumped `STATIC_ASSET_VERSION` from `bench-v3-ui-17` ā `bench-v3-ui-18`
so cached browsers see the new bytes.
## Tests
New `server/tests/web_ui.rs` covers both tasks:
**Task A:**
- `landing_page_renders_group_descriptions` ā verbatim v2 strings + the
SF-derived TPC blurb both appear on `/`.
- `group_page_renders_description` ā same icon on `/group/{slug}`.
- `vector_search_group_has_no_description_icon` ā no canonical
description ā no icon.
- `groups_api_carries_description_field` ā wire shape: `description`
field present where v2 had one, absent for vector-search groups.
**Task B:**
- `chart_includes_commits_with_partial_series_coverage` ā the headline
regression test: commits A (X+Y), B (only Y), C (X+Y); B appears in
`commits[]` with `null` for X.
- `chart_includes_commits_with_zero_rows_in_fact_table` ā a commit that
emitted only `compression_size` still appears on the random-access
chart's x-axis.
- `chart_excludes_commits_before_first_fact_row` ā commits older than
the bench's first row stay off the chart.
Plus unit tests in `api/descriptions.rs` for the static-vs-derived
dispatch and the SF parser edge cases.
## Conventions
- Branched from `ct/benchmarks-v3`, PR'd back to it.
- Every commit has a `Signed-off-by:` trailer.
- Don't auto-merge ā leaving for review.
- Local `cargo build` was hitting environment issues (target-dir lock
contention); skipped in favour of GitHub Actions for the canonical Rust
checks. The Rust changes are mechanical (new module, new field on
existing structs, refactor of `collect_*_chart` to a seed-then-fill
pattern) and well-covered by the new tests above.
## Test plan
- [ ] CI green (`cargo test -p vortex-bench-server`, fmt, clippy on
touched code) ā tracked by GitHub Actions.
- [ ] Open landing page, hover a group title's ā ā description appears
below the icon.
- [ ] Tab to a group title's ā ā description still appears (focus path).
- [ ] Visit `/group/<slug>` for Random Access ā description renders next
to chart count.
- [ ] Find a chart with known partial coverage; confirm
previously-missing commits now appear with visible gaps in the affected
series.
---
_Generated by [Claude
Code](https://claude.ai/code/session_01GCbWrKBT1ToBevooJRCJMx)_
---------
Signed-off-by: Claude <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com> Latest Branches
+20%
-17%
claude/benchmarks-v3-emitter-split -26%
Ā© 2026 CodSpeed Technology