Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
fix(flotilla): track post-aggregation request as autoscaler high-water mark The high-water mark recorded the fractional cpu_sum, but the request actually sent to Ray is the integer-aggregated bundle total. With min_cpu_per_task=0.1 the mark grew ~0.1 per cycle while ceil() only bumped the real CPU request every ~10 cycles, so scale-up for many pending tasks stalled for ~1/min_cpu_per_task cycles (≈50s at the default 5s interval) per extra CPU. Record the aggregated integer bundle totals (what Ray actually receives) as the mark instead. Because each cycle selects bundles until the fractional cpu_sum exceeds the integer mark, ceil() now bumps by at least one CPU every cycle, restoring the intended one-unit-per-cycle ramp while still never requesting less than before. Convergence is unchanged: once pending demand can no longer exceed the mark, the cycle is skipped. Verified: cargo test -p daft-distributed --lib (8 task tests pass), cargo check/clippy -p daft-distributed --features python clean.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
3 hours ago
fix(flotilla): keep multi-CPU tasks as individual autoscaler bundles aggregate_ray_bundles packed every CPU-only task into unit {"CPU": 1} bundles. That is wrong for a task requesting num_cpus >= 1: a 4-CPU task runs on one worker, so splitting it into 4 spread bundles lets the autoscaler provision 4 single-CPU nodes and leaves the task unschedulable. It also turned CPU magnitude into the loop count, so a huge or non-finite explicit num_cpus (inf as i64 == i64::MAX) could hang/OOM, and a NaN poisoned the running sum and zeroed the batch's CPU request. Only pack sub-1.0 CPU-only tasks now; tasks with GPU, memory, or num_cpus >= 1 keep an individual bundle (CPU rounded up to at least 1). Non-finite / non-positive CPU contributes nothing. The packed sum is now bounded by task count, so the loop can no longer blow up.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
6 hours ago
fix(flotilla): aggregate fractional CPU into integer Ray autoscaler bundles Ray's request_resources (<= 2.55; Daft pins ray==2.55.1) rejects non-integer bundle values: `isinstance(bundle[key], int)` raises `TypeError: each bundle key should be str and value as int.`. Sending a float CPU therefore crashes at runtime — even a whole 1.0, since pyo3 emits a Python float. Replace the per-task float bundle with `aggregate_ray_bundles`: CPU-only tasks have their fractional CPU summed and emitted as ceil(sum) unit {"CPU": 1} bundles, so N tasks at 0.1 CPU request ceil(0.1*N) CPUs instead of N (issue #7123), while never sending a non-integer value. Tasks carrying GPU or memory keep an individual bundle (CPU rounded up) since those resources pin placement to a node. Revert the try_autoscale type hint to dict[str, int]. Replace the fractional bundle unit tests with aggregation tests.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
7 hours ago
fix(autoscale): use TaskResourceRequest wrappers so min_cpu_per_task fallback applies
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
14 hours ago
fix(goosefs): forward retry/timeout/concurrency to OpenDAL and sparse multiline display Address review feedback on GoosefsConfig: - P1: to_opendal_config now forwards max_retries, retry_timeout_ms, connect_timeout_ms, read_timeout_ms, max_concurrent_requests and max_connections_per_io_thread into the returned config map (only when non-default), so user-provided values are no longer silently dropped at the Daft layer. - P2: multiline_display now gates every numeric/boolean field with an if value != default guard, mirroring how auth_username is handled. A default-constructed GoosefsConfig produces an empty multiline view, and IOConfig::multiline_display omits the GooseFS config = { ... } line entirely in that case. Adds regression tests covering both behaviours.
XuQianJin-Stars:feat/goosefs-support
18 hours ago
Refactor columns_min/columns_max to delegate to least/greatest Address review feedback: replace the list-based implementation (to_list().list_min()/list_max()) with direct delegation to the existing greatest/least scalar functions. Benefits: - Row-wise NULL skipping: result is NULL only when all inputs in a row are NULL, matching Spark Greatest/Least semantics. - Works on any comparable dtype (numeric, boolean, string, temporal), not just types supported by list aggregation. - Avoids the overhead of constructing an intermediate list column per row. Public surface is preserved: aliases (columns_min/columns_max) and empty-arg error messages remain unchanged. All 80 existing tests/dataframe/test_horizontal.py cases pass.
XuQianJin-Stars:feat/spark-math-functions
18 hours ago
fix(ci): pin maturin <1.14 to keep --timings=html working
XiaoHongbo-Hope:fix/ci-maturin-timings
19 hours ago
list_namespaces mirrors list_tables resolution
YuangGao:fix/list-namespaces-rule3-dispatch
22 hours ago
Latest Branches
CodSpeed Performance Gauge
0%
fix(flotilla): wire min_cpu_per_task into TaskResourceRequest
#7125
4 hours ago
5303372
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
CodSpeed Performance Gauge
0%
feat(io): add GooseFS support via OpenDAL services-goosefs
#7109
10 days ago
434897f
XuQianJin-Stars:feat/goosefs-support
CodSpeed Performance Gauge
0%
feat: add Spark-compatible math functions (bround, greatest, least, hex, unhex)
#7122
19 hours ago
d719636
XuQianJin-Stars:feat/spark-math-functions
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs