vortex-data
vortex
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
feat: use cardinality estimator for distinct count stats Replace the exact `HashMap`/`HashSet` previously used to compute distinct-value counts during compression stats generation with Cloudflare's `cardinality-estimator` crate. The estimator gives us a bounded-memory approximation (exact up to ~128 distinct values, then HyperLogLog++) so high-cardinality arrays no longer require an O(n) auxiliary hash table to answer the single question "how many unique values does this have?". - Integer stats swap the hash map for a `CardinalityEstimator` and track the most frequent value via a Boyer-Moore majority candidate plus a second-pass exact count. Sparse/dict schemes only care about the heavy hitter (>= 90% threshold) or a rough distinct ratio, so this is behaviourally equivalent for the decisions they make. - Float and string stats likewise drop their hash sets in favor of the estimator. - The integer and float dictionary encoders now rebuild the exact set of distinct values from the source array at compress time, since they need the values themselves and the stats layer no longer retains them. - `SequenceScheme`'s fast-path check for "all values are distinct" now tolerates the estimator's small approximation error; the deferred callback still validates sequences exactly. Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
1 minute ago
onpair: reframe dict padding comment around the token read width Tie the dict_bytes over-padding to the decoder's fixed token read width (MAX_TOKEN_SIZE) rather than a hardcoded byte count, and note it is the same over-read the decoder already accounts for on the final few codes. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
claude/dreamy-keller-uw3LB
2 minutes ago
test: expand utf8view and binaryview cuda export coverage Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
ad/expand-view-tests
5 minutes ago
test(vortex-turboquant): block-decomposition coverage and docs Migrate the suite to the block layout and add round-trip, fidelity (per-block and whole-vector MSE), seed-distinctness, multi-block null/zero-norm, malformed-metadata/code/norm, and NaN/Inf coverage. Refresh the crate docs and add a multi-block file round-trip. Signed-off-by: Connor Tsui <connor@spiraldb.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ct/tq-block
7 minutes ago
test(fsst): speed up i32 offset-overflow regression test The fsst_compress_offsets_overflow_i32 regression compressed ~2.5 GiB of random, incompressible data with a trained compressor to push cumulative FSST output past i32::MAX. Profiling the debug build (samply) showed ~90% of runtime in fsst::Compressor::compress_into and ~18 s in per-byte RNG pool generation. Use an empty FSST compressor instead: with no symbols every byte is emitted as a two-byte escape, so output is deterministically 2x the input. That crosses the i32::MAX boundary with only ~1.1 GiB of input (no random data needed), which is the cheapest possible way to reach the boundary since escapes are FSST's worst-case expansion. The test now also asserts the actual compressed byte size exceeds i32::MAX so it cannot silently stop covering the regression. Measured (debug, single run): 307 s -> 186 s (~1.65x), peak memory ~5 GiB -> ~3.4 GiB, RNG generation eliminated. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
claude/great-feynman-EqrcJ
11 minutes ago
binary compression Signed-off-by: Adam Gutglick <adam@spiraldb.com>
adamg/json-to-variant
12 minutes ago
less Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/aggregatearcswap
24 minutes ago
Add UUID literal expression to vortex-jni (#8154) In order to pushdown UUID pruning expression we need to be able to construct UUID literal
develop
28 minutes ago
Latest Branches
CodSpeed Performance Gauge
+3%
TurboQuant: Block Decomposition
#8139
1 hour ago
512ae34
ct/tq-block
CodSpeed Performance Gauge
+3%
Simplify fsst_compress_offsets_overflow_i32 test with empty compressor
#8158
2 hours ago
ff9bb8c
claude/great-feynman-EqrcJ
CodSpeed Performance Gauge
+3%
WIP: JSON/Variant experiments
#8156
13 minutes ago
70c91be
adamg/json-to-variant
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs