Avatar for the vortex-data user
vortex-data
vortex
BlogDocsChangelog

Performance History

Latest Results

feat: use cardinality estimator for distinct count stats Replace the exact `HashMap`/`HashSet` previously used to compute distinct-value counts during compression stats generation with Cloudflare's `cardinality-estimator` crate. The estimator gives us a bounded-memory approximation (exact up to ~128 distinct values, then HyperLogLog++) so high-cardinality arrays no longer require an O(n) auxiliary hash table to answer the single question "how many unique values does this have?". - Integer stats swap the hash map for a `CardinalityEstimator` and track the most frequent value via a Boyer-Moore majority candidate plus a second-pass exact count. Sparse/dict schemes only care about the heavy hitter (>= 90% threshold) or a rough distinct ratio, so this is behaviourally equivalent for the decisions they make. - Float and string stats likewise drop their hash sets in favor of the estimator. - The integer and float dictionary encoders now rebuild the exact set of distinct values from the source array at compress time, since they need the values themselves and the stats layer no longer retains them. - `SequenceScheme`'s fast-path check for "all values are distinct" now tolerates the estimator's small approximation error; the deferred callback still validates sequences exactly. Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
1 minute ago

Latest Branches

CodSpeed Performance Gauge
+3%
TurboQuant: Block Decomposition#8139
1 hour ago
512ae34
ct/tq-block
CodSpeed Performance Gauge
+3%
2 hours ago
ff9bb8c
claude/great-feynman-EqrcJ
CodSpeed Performance Gauge
+3%
13 minutes ago
70c91be
adamg/json-to-variant
© 2026 CodSpeed Technology
Home Terms Privacy Docs