Eventual-Inc
Daft
BlogDocsChangelog

Branches performance

Pull requests

feat(ai): add Google text embedder#5900
last run
10 hours ago
feat(ai): add Google text embedder support
10 hours ago
f5901b7
google-text-embeder
CodSpeed Performance Gauge
0%
fix(openai): respect model dtype when overriding embedding dimensions
14 hours ago
0f85579
fenfeng9:fix-openai-dtype
CodSpeed Performance Gauge
0%
perf: Support listing glob paths in parallel Signed-off-by: plotor <zhenchao.wang@hotmail.com>
21 hours ago
fac9b3f
plotor:zhenchao-from-glob-path
CodSpeed Performance Gauge
0%
feat(io): add filename provider support for parquet, csv and upload Implement a cross-language FilenameProvider mechanism for Daft writes, covering block-based parquet/csv writers and row-based URL upload. - Python - Extend `daft.io.filename_provider.FilenameProvider` and `_DefaultFilenameProvider` as the public strategy interface for filename generation. - Add optional `filename_provider` and internal `write_uuid` plumbing to `DataFrame.write_parquet` / `write_csv`, and route them via `LogicalPlanBuilder.write_tabular`. - Update `daft.logical.builder.LogicalPlanBuilder.write_tabular` to accept and forward `filename_provider` / `write_uuid` into the Rust logical plan. - Extend `daft.functions.url.upload` to accept `filename_provider` and generate a `write_uuid` for each logical upload. - Add `Expression.upload(..., filename_provider=...)` wrapper that passes through to `daft.functions.upload`. - Wire `daft.io.__init__` to expose `FilenameProvider` in the public API. - Rust logical plan & DSL - Extend `daft-logical-plan::OutputFileInfo` with optional `filename_provider` and `write_uuid` fields. - Represent the Python provider as `common_py_serde::PyObjectWrapper` for safe serde round-tripping, and adjust `OutputFileInfo::new` accordingly. - Thread `filename_provider` / `write_uuid` through `LogicalPlanBuilder::table_write` (and its pyo3 binding) into `SinkInfo::OutputFileInfo`. - Add `RuntimePyObject` support for `Literal::Python` in the DSL runtime to allow passing Python objects (such as providers) as UDF args. - Rust writers - Introduce `build_filename_with_provider` helper in `daft-writers::utils` that prefers calling a Python `FilenameProvider` hook when present, falling back to the previous UUID-based filename scheme. - Extend native parquet and csv writers (`create_native_parquet_writer` / `create_native_csv_writer`) to accept `filename_provider` and `write_uuid`, and use `build_filename_with_provider` with appropriate extensions ("parquet" / "csv"). - Teach `PhysicalWriterFactory::create_writer` to unwrap the `PyObjectWrapper` stored in `OutputFileInfo` and pass the underlying `Arc<Py<PyAny>>` plus `write_uuid` into native writer creation. - URL upload - Extend the `UrlUpload` UDF args to include `filename_provider: Option<RuntimePyObject>` and `write_uuid: Option<String>`. - When uploading into a single folder (`is_single_folder=True`), use the Python `FilenameProvider.get_filename_for_row(...)` hook (with ext="") to derive the basename; for row-specific full paths, keep the user-specified path untouched and do not call the provider. This change brings Daft in line with Ray Data's `FilenameProvider` concept, giving users deterministic and customizable control over output filenames across different sinks, while preserving the existing default naming scheme when no provider is supplied.
4 days ago
88378b3
Jay-ju:filename-provider
CodSpeed Performance Gauge
0%
© 2025 CodSpeed Technology
Home Terms Privacy Docs