Avatar for the vortex-data user
vortex-data
vortex
BlogDocsChangelog

Performance History

Latest Results

fix(decimal): fall back to Arrow comparison when bound storage type differs from array storage type When `between_unpack` could not cast a bound scalar's backing integer value to the array's storage type `T` (e.g. `DecimalValue::I32(82246)` → `i16`), it previously returned `Err(vortex_bail!(...))`. The fuzzer found a case where a `DecimalArray` with `values_type = I16` received a bound stored as `DecimalValue::I32`, causing the error to propagate and the fuzz harness to panic via `vortex_expect`. The fix: return `Ok(None)` instead of `Err(...)` in that case, signalling that this kernel cannot handle the input. `between_canonical` then falls through to the generic Arrow-based comparison path, which compares decimal values semantically by their shared precision and scale and correctly handles mixed storage widths. A regression test exercises the exact scenario from the crash report: an `i16`-backed decimal array compared against bounds backed by `i32`, where the upper bound value (82246) exceeds `i16::MAX`. Fixes #7955 Co-authored-by: Nicholas Gates <gatesn@users.noreply.github.com> Signed-off-by: "Claude" <claude@anthropic.com>
claude/issue-7955-20260515-2235
8 hours ago
pluggable registry for input/export arrow kernels (#7824) ## Summary Adds a pluggable `ArrowSession` registry on `VortexSession` for round-tripping Vortex extension types in and out of Arrow extension types. Unblocks Arrow round-trip for `arrow.uuid` today, with `arrow.parquet.variant`, GeoArrow, and tensor types as the next consumers. Part of #7686. ## API changes The session exposes two trait-driven plugin slots: - `ArrowExportVTable` — dispatched by **target Arrow extension name** (`ARROW:extension:name`). Implementations turn a Vortex `ArrayRef` into an Arrow `ArrayRef` shaped to the requested `Field`. Also provides `to_arrow_field` for schema inference when only a Vortex `DType` is in hand. - `ArrowImportVTable` — dispatched by **source Arrow extension name** carried on the incoming `Field`. Implementations turn an Arrow `ArrayRef` back into a Vortex `ArrayRef`, including any storage re-encoding (e.g. `FixedSizeBinary[16]` → `FixedSizeList<u8; 16>` for UUID). Both traits return `Unsupported(input)` to defer to the next plugin or to the canonical fallback, so multiple plugins can register against the same key and probe in order. New session entry points (`vortex-array/src/arrow/session.rs`): - `ArrowSession::to_arrow_field` / `to_arrow_schema` — Vortex `DType` → Arrow `Field`/`Schema`, recursing into containers so nested extension fields go through the registered plugin. - `ArrowSession::from_arrow_field` / `from_arrow_schema` — inverse direction, plugin-aware. - `ArrowSession::from_arrow_record_batch` / `execute_record_batch` — `RecordBatch` round-trip. - `ArrowSessionExt` extension trait so any `SessionExt` can call `session.arrow().…`. The default session pre-registers the builtin UUID plugin (`vortex-array/src/extension/uuid/arrow.rs`). ## What's *not* in the plugin layer `Date`, `Time`, and `Timestamp` are Vortex builtin extensions that map directly to native Arrow temporal types, so they continue to go through the canonical executor (`vortex-array/src/arrow/executor/temporal.rs`) rather than the plugin registry. The plugin layer is reserved for **Arrow extension types** that the canonical path can't express. ## DataFusion wiring `vortex-datafusion` now goes through the session for schema/array conversion: - `convert/schema.rs::calculate_physical_schema` uses `ArrowSession::to_arrow_field` so extension metadata survives projection. - `persistent/format.rs` and `persistent/opener.rs` route schema inference through the session. - `persistent/sink.rs` uses `from_arrow_record_batch`, passing the original schema separately from `RecordBatch::schema()` to preserve `ARROW:extension:name` metadata that DataFusion strips at runtime. ## Tests Two new end-to-end tests in `vortex-datafusion/src/persistent/tests.rs`: - `arrow_uuid_extension_roundtrip` — write Arrow UUID column to a Vortex file via the session, `SELECT *` it back, assert the field still carries the `Uuid` extension type and the values match. - `arrow_uuid_extension_roundtrip_nested_struct` — same flow with the UUID nested in a top-level `Struct`, exercising recursive session-aware schema inference. --------- Signed-off-by: Andrew Duffy <andrew@a10y.dev> Signed-off-by: Baris Palaska <barispalaska@gmail.com> Co-authored-by: Baris Palaska <barispalaska@gmail.com>
develop
10 hours ago

Latest Branches

CodSpeed Performance Gauge
-3%
Fuzzing Crash: VortexError in file_io#7956
8 hours ago
9ca2250
claude/issue-7955-20260515-2235
CodSpeed Performance Gauge
-3%
9 hours ago
eb72ffd
ngates/file-metadata-segments
CodSpeed Performance Gauge
+16%
pluggable registry for input/export arrow kernels#7824
10 hours ago
7949f04
aduffy/arrow-vtable
© 2026 CodSpeed Technology
Home Terms Privacy Docs