vortex-data
vortex
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
feat: use cardinality estimator for distinct count stats Replace the exact `HashMap`/`HashSet` previously used to compute distinct-value counts during compression stats generation with Cloudflare's `cardinality-estimator` crate. The estimator gives us a bounded-memory approximation (exact up to ~128 distinct values, then HyperLogLog++) so high-cardinality arrays no longer require an O(n) auxiliary hash table to answer the single question "how many unique values does this have?". - Integer stats swap the hash map for a `CardinalityEstimator` and track the most frequent value via a Boyer-Moore majority candidate plus a second-pass exact count. Sparse/dict schemes only care about the heavy hitter (>= 90% threshold) or a rough distinct ratio, so this is behaviourally equivalent for the decisions they make. - Float and string stats likewise drop their hash sets in favor of the estimator. - The integer and float dictionary encoders now rebuild the exact set of distinct values from the source array at compress time, since they need the values themselves and the stats layer no longer retains them. - `SequenceScheme`'s fast-path check for "all values are distinct" now tolerates the estimator's small approximation error; the deferred callback still validates sequences exactly. Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
6 hours ago
trustedlen Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
6 hours ago
fixes Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
7 hours ago
fixes Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
7 hours ago
more Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
8 hours ago
less Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
9 hours ago
simpler Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
9 hours ago
more Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/cardinality-estimator
10 hours ago
Latest Branches
CodSpeed Performance Gauge
-25%
Use approximate cardinality to decide whether to use dict compression
#7759
9 days ago
7fa9309
rk/cardinality-estimator
CodSpeed Performance Gauge
-61%
[claude] add benchmarks website v3 design overview and plan
#7756
11 hours ago
f965259
claude/benchmarks-v3-refactor
CodSpeed Performance Gauge
-25%
Fix weird signature of with_slots functions
#7758
18 hours ago
10a45d4
rk/signature
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs