Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

refactor(shuffle): drop env-var override for chunk target bytes Removes DAFT_SHUFFLE_CHUNK_BYTES and the helper fns that read it; the 4 MiB constant (renamed DEFAULT_ -> CHUNK_TARGET_BYTES) is now the only knob. The read-side else branch that disabled concat when the env var was 0 is dropped along with the now-unused READ_PREFETCH. Also moves arrow-select to the workspace dep table and simplifies an unnecessary exists() check before create_dir_all in the oneshot writer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/flight-shuffle-perf
43 minutes ago
feat(flotilla): Distributed Limit Counter (#6942) ## Changes Made Replaces the materializing two-phase local+global limit with a streaming limit driven by a Ray-actor-backed global counter. - **`LimitCounterActor`** (`daft/execution/ray_distributed_limit.py`) holds `(remaining_skip, remaining_take)`. `claim(input_id, num_rows) -> (skip, take, done)` is atomic; `start_task(input_id)` refunds a prior attempt's claims so retries see consistent state. `await_limit_completion()` resolves once the limit is fully claimed and returns the input_ids that actually consumed budget. Actor is pinned to head node. - **`DistributedLimitSink`** (`src/daft-local-execution/src/streaming_sink/distributed_limit.rs`) calls `claim(input_id, num_rows)` on the counter the actor per morsel, and slices the morsel based on the returned `(skip, take, done)`. - **`LimitNode`** (`src/daft-distributed/src/pipeline_node/limit.rs`) creates the counter actor, and appends distributed_limit tasks to the input tasks. It awaits the limit completion from the actor, and once done, cancels the scheduling of any subsequent limit tasks. It is aware of which input ids contributed to the limit, and only cancels tasks not with these ids. - **Scheduler** now filters cancelled tasks at `schedule_tasks` and emits `TaskEvent::Cancelled` to avoid scheduling new limit tasks. ### Ordering note Across-partition order is no longer preserved — workers race to claim. `tests/integration/iceberg/test_partition_pruning.py` sorts before limiting (matching the `_on_number` precedent already in that file); a related dataframe test was loosened similarly. ## Related Issues <!-- none --> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main
45 minutes ago
chore: migrate daft.io.lance to daft_lance
rchowell/lance-update
2 hours ago
fix test?
colin/distributed-limit-actor
2 hours ago

Latest Branches

CodSpeed Performance Gauge
-12%
perf(shuffle): Write one shuffle file per task instead of N partition files#6948
1 hour ago
7c45b6a
colin/flight-shuffle-perf
CodSpeed Performance Gauge
0%
CodSpeed Performance Gauge
0%
2 hours ago
e038d92
rchowell/lance-update
© 2026 CodSpeed Technology
Home Terms Privacy Docs