Latest Results
feat: Arrow PyCapsule Interface (#6745)
## Changes Made
Implements the [Arrow PyCapsule
Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html)
for zero-copy Arrow data exchange between Daft and other
Arrow-compatible libraries (pyarrow, pandas 2.2+, nanoarrow, polars,
etc.).
**Export dunders:**
- `PySeries.__arrow_c_schema__` / `__arrow_c_array__`
- `PyRecordBatch.__arrow_c_schema__` / `__arrow_c_array__`
- `PyMicroPartition.__arrow_c_schema__` / `__arrow_c_stream__`
- `DataFrame.__arrow_c_stream__` (materializes then delegates to
MicroPartition's native capsule โ no PyArrow round-trip)
**Import:**
- `daft.from_arrow()` accepts any `ArrowStreamExportable`
(runtime-checkable Protocol) in addition to existing `pa.Table` path
- `MicroPartition.from_arrow_stream()` native Rust import via
`__arrow_c_stream__`
**Extension types:**
- `arrow.fixed_shape_tensor` canonical extension โ
`DataType::FixedShapeTensor`
- Daft super-extension (embedding, image, tensor) preserved across
pycapsule roundtrip
**Infra:**
- FFI helpers consolidated in `common-arrow-ffi` (capsule export/import,
requested_schema parsing, validator)
- Added `record_batch_to_arrow_rs` (pure-Rust, no PyArrow) and
`cast_record_batch_to_schema` in `daft-recordbatch::ffi`
Tests (28): Series/RecordBatch/MicroPartition export, stream import
(incl. pa.Table, pa.RecordBatchReader, pandas DataFrame, empty,
multi-type, nested, multi-batch, null-heavy), `requested_schema` cast +
non-capsule rejection, `arrow.fixed_shape_tensor` + Daft embedding
extension roundtrip, Python-level interop.
## Related Issues
Closes #2504 Latest Branches
-1%
chenghuichen:fix-pypaimon 0%
BABTUNA:feat/temporal-add-months 0%
BABTUNA:feat/file-write-support ยฉ 2026 CodSpeed Technology