Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

feat(temporal): add Spark-style unix extractors and weekday Adds unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, and weekday for Spark parity (#3798). The unix extractors are thin Python and SQL wrappers over the existing to_unix_epoch with an explicit time_unit. weekday wraps day_of_week directly since Daft's day_of_week already uses Spark's Monday=0, Sunday=6 numbering.
BABTUNA:feat/temporal-unix-extractors
4 hours ago
style: apply ruff format and cargo fmt
BABTUNA:feat/temporal-add-months
11 hours ago
fix: Handle dtype mismatch error in join_asof join keys (#6904) Currently as_of joins with mismatched by keys would fail with `mismatch dtype error`. The fix is to 1. normalize and cast the keys to a shared supertype (e..g. int64 and float64 are normalized to float64), which is the same methodology used for the on_key, as well as for the join keys of equality joins. 2. remove the computation of right_cols_to_drop in the local executor, because it does not drop the casted expressions computed during normalization, e.g. Cast(Column("left_on_key"), Utf8), and led to duplicate columns produced in the output (the "left_on_key" column was duplicated in the result). Since we already computed the desired output schema in the logical plan, we can simply use this as the basis to prune columns during execution. ``` left = {"ts": [1, 3, 5], "v": [...]} # ts is Int64 right = {"ts": [2.0, 4.0], "w": [...]} # ts is Float64 # correct output {"ts": [1, 3, 5 ], "v": [10, 30, 50], "w": [None, 20, 40]} # without the fix: the second "ts" silently overwrites the first: {"ts": [None, 2.0, 4.0], "v": [10, 30, 50], "w": [None, 20, 40]} # ^^^ left ts [1, 3, 5] is gone — no error raised ``` **a more in-depth explanation for the second bug:** 1. This bug requires three conditions to trigger: - a join key (meaning it should have been a candidate for right_cols_to_drop) - that shares a name with the other side (explained later) - mismatched types on that key (causing normalization to wrap it in a Cast expression, that prevents it from being caught in right_cols_to_drop). 2. Here’s how the bug occurs: 1. At the logical plan layer, right_cols_to_drop is computed from bare unresolved column expressions — before any normalization has occurred. It is then passed to deduplicate_asof_join_columns, which uses it to determine which right-side columns need to be renamed with a right. prefix. Since the join keys are inside the dropped cols set, the deduplication step skips it since it’s already being dropped. 2. After translation, at the physical plan layer, AsofJoinOperator::new ignored the output_schema and recomputed right_cols_to_drop from scratch — but by this point, translation had already wrapped bare Column("g") expressions in Cast(Column("g"), Utf8). The extract_name closure used in the recomputation only handled bare column expressions, so it returned None for any cast-wrapped expression, silently omitting the right key column from right_cols_to_drop. 4. Without it in the drop set, prune_right_batch kept the column, producing a record batch with duplicate column names. When to_pydict() built a Python dict, the duplicate key caused the right-side values to silently overwrite the left-side values, corrupting the output.
main
12 hours ago
fix(distributed): emit empty downstream task for limit(0) (#6916) ## Changes Made `LimitNode` in the Flotilla pipeline computes `max_concurrent_tasks = total_remaining().div_ceil(estimated_num_rows)`. For `limit=0` against any input source carrying row-count metadata (e.g. `InMemoryScan`), this collapses to 0, the submission loop never runs, and **no downstream task is ever emitted**. Blocking sinks downstream — notably ungrouped aggregates — then have 0 input tasks, never finalize, and produce 0 output rows. The native runner doesn't have a Limit pipeline node, and its `AggregateSink::finalize` always runs once even with zero input states, which is why the native path was unaffected. ### Fix Short-circuit `is_take_done() && is_skip_done()` at the top of `limit_execution_loop` by emitting one empty scan task and returning — so downstream blocking sinks still finalize and produce their conventional one-row null result. Pulled the empty-scan construction into a private `build_empty_scan_task` helper to dedupe with the existing "all rows skipped" code path. ### Repro (before fix) ```python import daft from daft import col df = daft.from_pydict({"values": [1.0, 2.0]}).limit(0) df.agg(col("values").sum().alias("s")).collect().to_pydict() # Ray: {"s": []} ← bug # Native: {"s": [None]} ``` ### Tests - New unit test `test_limit_zero_emits_empty_task` in `src/daft-distributed/src/pipeline_node/limit.rs` covering the limit=0 path. - Verified locally: `tests/dataframe/test_percentile.py` all 14 tests pass on Ray, and `tests/dataframe/test_limit_offset.py` all 199 tests pass on Ray. ## Related Issues Unblocks the v0.7.11 release — `test_percentile_empty_input` was failing on the Ray test job: https://github.com/Eventual-Inc/Daft/actions/runs/25575457827/job/75087986460 The test was added in #6153 (percentile/median ops). It exposed this latent `limit=0` bug in the distributed limit node; the bug also affects `sum`, `count`, `mean`, and any other ungrouped aggregation over `limit(0)`-empty input on the Ray runner. ## Test plan - [ ] CI green - [ ] `test_percentile_empty_input` passes on Ray - [ ] `test_limit_offset.py` suite stays green on Ray 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Varun Madan <varun@Varuns-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main
12 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
feat(temporal): add Spark-style unix extractors and weekday#6920
5 hours ago
9c6bea1
BABTUNA:feat/temporal-unix-extractors
CodSpeed Performance Gauge
-1%
5 hours ago
9339879
BABTUNA:feat/temporal-tz-conversions
CodSpeed Performance Gauge
0%
10 hours ago
fc81d4b
BABTUNA:feat/temporal-add-months
© 2026 CodSpeed Technology
Home Terms Privacy Docs