Avatar for the vortex-data user
vortex-data
vortex
BlogDocsChangelog

Performance History

Latest Results

lock Signed-off-by: Robert Kruszewski <github@robertk.io>
refactor/parent-ref-stack-allocated
1 hour ago
simpler Signed-off-by: Robert Kruszewski <github@robertk.io>
refactor/parent-ref-stack-allocated
3 hours ago
locks Signed-off-by: Robert Kruszewski <github@robertk.io>
refactor/parent-ref-stack-allocated
3 hours ago
u Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
claude/bitpack-compare-speedup-KGPS3
5 hours ago
fastlanes: bit-packed compare-constant fast path + bitpack_constant kernel Speeds up the `bitpack_compare` bench from the parent commit with two independent optimizations driven by the same observation — a bit-packed lane holds values in `[0, 2^bit_width - 1]`, so a constant outside that range can be answered analytically without touching the packed buffer. **Compare-constant fast path (`compute/compare.rs`)** Register a `CompareKernel` for `BitPacked` that short-circuits when the RHS constant `c` is outside `[0, 2^bit_width - 1]`. For each operator the answer is a constant boolean modulo patches and validity: Eq/NotEq - false / true everywhere Lt/Lte/Gt/Gte - constant once `c` is on either side of the range Detecting the range is an `O(1)` `i128` check via the new `BitPackedData::value_fits_bit_width` helper. With no patches and no nulls the kernel returns a `ConstantArray<bool>` (also `O(1)`); otherwise it allocates a `BitBuffer`, fills it with the constant result, and overlays the per-position outcome at each patch index. In-range constants fall through to the canonical decompress + Arrow compare path; tests exercise both fall-throughs. **`bitpack_constant` analytical encoder (`array/bitpack_compress.rs`)** Add a constant-only pack kernel that builds the FastLanes bit pattern for a `[constant; len]` input without calling `BitPacking::pack`. For constant input every lane produces the same `bit_width` output words; we compute those words analytically - each output word's `j`-th bit is bit `(k * T_bits + j) mod bit_width` of `c` - then `memset` each word `LANES` times into a stack chunk template and `memcpy` the template into every full chunk. The standard packer is only invoked for the partial tail (zero-padded past `len`). `bitpack_encode_constant` wraps the buffer up as a `BitPackedArray`. A bitwise-equivalence rstest covers byte-identity with `BitPacking::pack` across lengths, widths, and constants. **Benches** * `bitpack_compare` (added in the parent commit) on this branch now exercises the fast path; at `bit_width ∈ {4, 16}`, `len ∈ {1024, 65536}` it runs in ~1.4-1.5 µs vs 8-125 µs for the decompress + Arrow baseline. * New `bitpack_constant` bench compares the analytical kernel against the full `bitpack_encode` pipeline on uniform-constant input; at 64 K u32 elements the analytical kernel is roughly 23-62x faster. **Plan doc (`docs/inrange_compare_plan.md`)** Document the follow-up plan to accelerate *in-range* ordering comparisons: compare the packed array against the packed constant via SWAR less-than per supported bit width (Routes A/B/C, including Knuth broadword with rotation tables for widths that straddle word boundaries), derive the four ordering operators from one `Lt` primitive, and benchmark against the canonical SIMD baseline before landing. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
claude/bitpack-compare-speedup-KGPS3
5 hours ago

Latest Branches

CodSpeed Performance Gauge
+7%
Allow running reduce_parent operations on stack allocated parents#7751
1 hour ago
ebfc11e
refactor/parent-ref-stack-allocated
CodSpeed Performance Gauge
+3%
14 hours ago
172e0c7
claude/flatbuffers-memory-safety-XKbWQ
CodSpeed Performance Gauge
Ɨ2.6
5 hours ago
3b1b8cf
claude/bitpack-compare-speedup-KGPS3
Ā© 2026 CodSpeed Technology
Home Terms Privacy Docs