Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

fix style
huleilei:hll/lance_doc
11 hours ago
Merge branch 'main' into hll/lance_doc
huleilei:hll/lance_doc
14 hours ago
feat: update columns for lance
Jay-ju:update-columns
18 hours ago
tmp
Jay-ju:ignore-error
20 hours ago
feat(io): add filename provider support for parquet, csv and upload Implement a cross-language FilenameProvider mechanism for Daft writes, covering block-based parquet/csv writers and row-based URL upload. - Python - Extend `daft.io.filename_provider.FilenameProvider` and `_DefaultFilenameProvider` as the public strategy interface for filename generation. - Add optional `filename_provider` and internal `write_uuid` plumbing to `DataFrame.write_parquet` / `write_csv`, and route them via `LogicalPlanBuilder.write_tabular`. - Update `daft.logical.builder.LogicalPlanBuilder.write_tabular` to accept and forward `filename_provider` / `write_uuid` into the Rust logical plan. - Extend `daft.functions.url.upload` to accept `filename_provider` and generate a `write_uuid` for each logical upload. - Add `Expression.upload(..., filename_provider=...)` wrapper that passes through to `daft.functions.upload`. - Wire `daft.io.__init__` to expose `FilenameProvider` in the public API. - Rust logical plan & DSL - Extend `daft-logical-plan::OutputFileInfo` with optional `filename_provider` and `write_uuid` fields. - Represent the Python provider as `common_py_serde::PyObjectWrapper` for safe serde round-tripping, and adjust `OutputFileInfo::new` accordingly. - Thread `filename_provider` / `write_uuid` through `LogicalPlanBuilder::table_write` (and its pyo3 binding) into `SinkInfo::OutputFileInfo`. - Add `RuntimePyObject` support for `Literal::Python` in the DSL runtime to allow passing Python objects (such as providers) as UDF args. - Rust writers - Introduce `build_filename_with_provider` helper in `daft-writers::utils` that prefers calling a Python `FilenameProvider` hook when present, falling back to the previous UUID-based filename scheme. - Extend native parquet and csv writers (`create_native_parquet_writer` / `create_native_csv_writer`) to accept `filename_provider` and `write_uuid`, and use `build_filename_with_provider` with appropriate extensions ("parquet" / "csv"). - Teach `PhysicalWriterFactory::create_writer` to unwrap the `PyObjectWrapper` stored in `OutputFileInfo` and pass the underlying `Arc<Py<PyAny>>` plus `write_uuid` into native writer creation. - URL upload - Extend the `UrlUpload` UDF args to include `filename_provider: Option<RuntimePyObject>` and `write_uuid: Option<String>`. - When uploading into a single folder (`is_single_folder=True`), use the Python `FilenameProvider.get_filename_for_row(...)` hook (with ext="") to derive the basename; for row-specific full paths, keep the user-specified path untouched and do not call the provider. This change brings Daft in line with Ray Data's `FilenameProvider` concept, giving users deterministic and customizable control over output filenames across different sinks, while preserving the existing default naming scheme when no provider is supplied.
Jay-ju:filename-provider
1 day ago

Active Branches

feat(lance): add btree index in create_scalar_index and update Lance docs
last run
11 hours ago
#5817
CodSpeed Performance Gauge
0%
last run
18 hours ago
#5910
CodSpeed Performance Gauge
0%
#5418
CodSpeed Performance Gauge
0%
© 2026 CodSpeed Technology
Home Terms Privacy Docs