Latest Results
pluggable registry for input/export arrow kernels (#7824)
## Summary
Adds a pluggable `ArrowSession` registry on `VortexSession` for
round-tripping Vortex extension types in and out of Arrow extension
types. Unblocks Arrow round-trip for `arrow.uuid` today, with
`arrow.parquet.variant`, GeoArrow, and tensor types as the next
consumers.
Part of #7686.
## API changes
The session exposes two trait-driven plugin slots:
- `ArrowExportVTable` — dispatched by **target Arrow extension name**
(`ARROW:extension:name`). Implementations turn a Vortex `ArrayRef` into
an Arrow `ArrayRef` shaped to the requested `Field`. Also provides
`to_arrow_field` for schema inference when only a Vortex `DType` is in
hand.
- `ArrowImportVTable` — dispatched by **source Arrow extension name**
carried on the incoming `Field`. Implementations turn an Arrow
`ArrayRef` back into a Vortex `ArrayRef`, including any storage
re-encoding (e.g. `FixedSizeBinary[16]` → `FixedSizeList<u8; 16>` for
UUID).
Both traits return `Unsupported(input)` to defer to the next plugin or
to the canonical fallback, so multiple plugins can register against the
same key and probe in order.
New session entry points (`vortex-array/src/arrow/session.rs`):
- `ArrowSession::to_arrow_field` / `to_arrow_schema` — Vortex `DType` →
Arrow `Field`/`Schema`, recursing into containers so nested extension
fields go through the registered plugin.
- `ArrowSession::from_arrow_field` / `from_arrow_schema` — inverse
direction, plugin-aware.
- `ArrowSession::from_arrow_record_batch` / `execute_record_batch` —
`RecordBatch` round-trip.
- `ArrowSessionExt` extension trait so any `SessionExt` can call
`session.arrow().…`.
The default session pre-registers the builtin UUID plugin
(`vortex-array/src/extension/uuid/arrow.rs`).
## What's *not* in the plugin layer
`Date`, `Time`, and `Timestamp` are Vortex builtin extensions that map
directly to native Arrow temporal types, so they continue to go through
the canonical executor (`vortex-array/src/arrow/executor/temporal.rs`)
rather than the plugin registry. The plugin layer is reserved for
**Arrow extension types** that the canonical path can't express.
## DataFusion wiring
`vortex-datafusion` now goes through the session for schema/array
conversion:
- `convert/schema.rs::calculate_physical_schema` uses
`ArrowSession::to_arrow_field` so extension metadata survives
projection.
- `persistent/format.rs` and `persistent/opener.rs` route schema
inference through the session.
- `persistent/sink.rs` uses `from_arrow_record_batch`, passing the
original schema separately from `RecordBatch::schema()` to preserve
`ARROW:extension:name` metadata that DataFusion strips at runtime.
## Tests
Two new end-to-end tests in `vortex-datafusion/src/persistent/tests.rs`:
- `arrow_uuid_extension_roundtrip` — write Arrow UUID column to a Vortex
file via the session, `SELECT *` it back, assert the field still carries
the `Uuid` extension type and the values match.
- `arrow_uuid_extension_roundtrip_nested_struct` — same flow with the
UUID nested in a top-level `Struct`, exercising recursive session-aware
schema inference.
---------
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Baris Palaska <barispalaska@gmail.com>
Co-authored-by: Baris Palaska <barispalaska@gmail.com> Latest Branches
-3%
ngates/file-metadata-segments -3%
claude/issue-7955-20260515-2235 +16%
© 2026 CodSpeed Technology