vortex-data
vortex
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
feat: use cardinality estimator for distinct count stats Replace the exact `HashMap`/`HashSet` previously used to compute distinct-value counts during compression stats generation with Cloudflare's `cardinality-estimator` crate. The estimator gives us a bounded-memory approximation (exact up to ~128 distinct values, then HyperLogLog++) so high-cardinality arrays no longer require an O(n) auxiliary hash table to answer the single question "how many unique values does this have?". - Integer stats swap the hash map for a `CardinalityEstimator` and track the most frequent value via a Boyer-Moore majority candidate plus a second-pass exact count. Sparse/dict schemes only care about the heavy hitter (>= 90% threshold) or a rough distinct ratio, so this is behaviourally equivalent for the decisions they make. - Float and string stats likewise drop their hash sets in favor of the estimator. - The integer and float dictionary encoders now rebuild the exact set of distinct values from the source array at compress time, since they need the values themselves and the stats layer no longer retains them. - `SequenceScheme`'s fast-path check for "all values are distinct" now tolerates the estimator's small approximation error; the deferred callback still validates sequences exactly. Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
15 hours ago
trustedlen Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
15 hours ago
fixes Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
15 hours ago
fixes Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
15 hours ago
more Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
17 hours ago
less Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
17 hours ago
simpler Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
17 hours ago
more Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
19 hours ago
Latest Branches
CodSpeed Performance Gauge
-25%
Use approximate cardinality to decide whether to use dict compression
#7759
10 days ago
7fa9309
rk/cardinality-estimator
CodSpeed Performance Gauge
-61%
[claude] add benchmarks website v3 design overview and plan
#7756
19 hours ago
f965259
claude/benchmarks-v3-refactor
CodSpeed Performance Gauge
-25%
Fix weird signature of with_slots functions
#7758
1 day ago
10a45d4
rk/signature
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs