Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

genericize RandomShuffle shuffle backend
chenghuichen:random-shuffle
15 minutes ago
fix(udf): handle UDF expressions with no column references (#6805) (#6814) Fixes #6805. Two commits: failing tests (red), then the fix (green). Two related bugs surface when a UDF expression has no column references after constant folding (e.g., `with_column("msg", lit("hello"))` followed by a UDF whose only input is `msg` — the optimizer inlines the literal and the UDF expression collapses to a literal-only form). ## Part 1 — `remap_used_cols` returns empty Vec instead of `vec![0]` `daft_dsl::utils::remap_used_cols` previously returned `vec![0]` as a "borrow column 0 to keep the row count alive" fallback. The downstream UDF op (streaming sink for async UDFs, intermediate op for sync UDFs) then indexed into a batch that projection pushdown had narrowed to zero columns, panicking with `index out of bounds: the len is 0 but the index is 0`. `RecordBatch::get_columns(&[])` already preserves `num_rows`, so the fallback isn't needed. ## Part 2 — broadcast length-1 inputs in Python UDF eval branches Even with the panic fixed, evaluating a literal-only input yields a length-1 Series, so the UDF was invoked once and the result broadcast to N rows. This is wrong for non-pure UDFs (random sampling, external API calls, anything stateful) — the user wrote `with_column` expecting per-row execution. Fix: in both `Expr::ScalarFn(ScalarFn::Python)` branches (sync and async) of `daft-recordbatch/src/lib.rs`, broadcast any length-1 input Series to the upstream row count before invoking the UDF. Mirrors the post-result broadcast already in `async_udf.rs` and `intermediate_ops/udf.rs`. ## Tests Six failing repros land in the first commit, all green after the fix: 1. Verbatim repro from #6805 (panic on async batch UDF + select) 2. Async batch UDF property test (UDF must see N rows, not run-once-broadcast) 3. Sync batch UDF (different eval branch) 4. Row-wise UDF (`@daft.func`) 5. Empty input batch (`row_count == 0`) 6. Ray integration: multi-partition input on the Ray runner exercises the actor UDF path ## Scope - **Native runner**: panic + broadcast bugs both fixed. - **Ray runner**: panic fixed via Part 1 (flows through `actor_udf.rs:156`'s call to `remap_used_cols`); the multi-partition Ray integration test confirms. - **Legacy `@daft.udf` decorator**: NOT addressed — that decorator routes through `Expr::Function` / `LegacyPythonUDF` and is being removed in 0.8. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main
26 minutes ago
use correct schema
rchowell/icepush
34 minutes ago
fix progress bar
chris/taskevent-metadata
1 hour ago
add name() to backend; fix comments in tests
oh/into_batches_without_emitting_flight_refs
4 hours ago
udaf at python level
chenghuichen:udaf-python
5 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
feat: genericize RandomShuffle to support Flight shuffle backend#6808
40 minutes ago
3f2d061
chenghuichen:random-shuffle
CodSpeed Performance Gauge
0%
1 hour ago
3ad4219
rchowell/icepush
CodSpeed Performance Gauge
+13%
2 hours ago
33d1f74
chris/taskevent-metadata
© 2026 CodSpeed Technology
Home Terms Privacy Docs