Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
chore: Simplify local Repartition operator (#6739) ## Changes Made I'm going through and rewriting the repartition operator to handle some of the weird state management, but I was doing some other cleanup as well.
main
53 seconds ago
chore: update paimon ImportError hints to use new extra Points users at `pip install daft[paimon]` instead of bare `pip install pypaimon` now that the extra exists.
aaron-ang:paimon-extra
19 minutes ago
ci: instrument integration-test-ray for codecov The integration-test-ray job currently consumes a pre-built release wheel and runs tests uninstrumented, so nothing it exercises reaches codecov. daft-distributed's driver-side code paths (e.g. plan/runner.rs, the RaySwordfishWorker pyclass, pipeline_node/*) show artificially low coverage as a result. Switch the job to build maturin develop in place with -C instrument-coverage, emit an lcov report, and upload it under coverage-reports-integration-ray so publish-coverage-reports merges it into the existing codecov upload. Separate rust-cache key keeps the coverage target dir from thrashing the unit-test dev-build cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
desmond/coverage-integration-ray
21 minutes ago
refactor: simplify cfg gating from not(feature=python) to just cfg(test) execute_local_plan and LocalPlanOutput compile fine under all feature sets (internal cfg branches handle the python vs non-python differences), so they don't need feature gating. This means everything in daft-distributed can use plain #[cfg(test)] instead of #[cfg(all(test, not(feature = "python")))], which works better with editors configured for features=all. https://claude.ai/code/session_01FSdbkeidSkPZNWT3supLK9
claude/review-distributed-tests-L6JMj
24 minutes ago
feat: percentile and median ops (#6153) ## Changes Made - Add exact percentile aggregation with O(n) selection (select_nth_unstable_by) + linear interpolation - Median is lowered to percentile(0.5) and shares the same kernel; AggExpr::Median is kept only so SQL MEDIAN() can register via the SQLFunction trait - Two-stage distributed aggregation for both percentile and median (collect into list, then compute on merged list) - SQL support for `PERCENTILE(col, pct)` and `MEDIAN(col)` ## Related Issues Closes #3491.
main
29 minutes ago
fix(scheduler): include dispatched tasks in autoscaling ratio After the Ray autoscaler ramp-up rework in #6653, one bug from the original autoscaler underscaling investigation is still live in `DefaultScheduler::needs_autoscaling()`. The ratio is computed as `pending_tasks / total_capacity`. However, `schedule_tasks()` runs immediately before this check and drains pending tasks onto available workers, so the ratio only reflects residual demand. When the cluster is saturated and most demand has just been dispatched, the ratio collapses below the threshold and `try_autoscale()` is never called. To fix this, track `last_scheduled_count` in the scheduler and include it in the numerator so the ratio reflects total demand (pending + just-dispatched). Reset the counter inside `get_autoscaling_request()` to prevent double-counting when it is called multiple times between `schedule_tasks()` rounds. Also strips zero-valued GPU/memory keys from Ray resource bundles in `try_autoscale()` so the autoscaler doesn't interpret them as demand for zero-resource bundles on specialized nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
desmond/fix-autoscaler-underscaling
34 minutes ago
just some quick cleanup
slade/quick-cleanup-repartition
59 minutes ago
fix(io): Write metrics in close for the last batch (#6606) ## Changes Made - The issue arises from `close` in [`batch.rs`](https://github.com/Eventual-Inc/Daft/blob/9a58f564225de4d385b98bf2780aeb179111259d/src/daft-writers/src/batch.rs#L206) ignoring the metrics/stats for the last batch. - Added final bytes to `total_physical_bytes_written` - In `write.rs`: Calculate the delta in final metrics and call `add_write_result` Used Claude to plan. Verified the implementation using the same snippet in the issue and logging metric using `result._metadata.to_recordbatch().to_pylist()` ## Related Issues Closes #6518
main
1 hour ago
Latest Branches
CodSpeed Performance Gauge
-1%
feat: add paimon extra for pypaimon dep
#6743
37 minutes ago
828dec2
aaron-ang:paimon-extra
CodSpeed Performance Gauge
0%
ci: instrument integration-test-ray for codecov
#6744
42 minutes ago
8e6f144
desmond/coverage-integration-ray
CodSpeed Performance Gauge
0%
test: Add local worker execution and statistics testing infrastructure
#6697
48 minutes ago
5eb6c7c
claude/review-distributed-tests-L6JMj
Ā© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs