Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
perf(grouped-agg): bump NUM_SHARDS_PER_MORSEL to 8 (empirically tuned)
BABTUNA:perf/sharded-grouped-agg
10 hours ago
fix(flotilla): honor explicit num_cpus=0 in autoscaler bundles aggregate_ray_bundles forced CPU >= 1 (via .max(1) on individual bundles and a fixed CPU:1 on every GPU bundle), so a task that explicitly sets num_cpus=0 still requested a CPU — breaking "explicit num_cpus passes through unchanged" and over-requesting CPU for GPU-only / memory-only workloads. Drop the .max(1); give GPU bundles CPU only when the packed tasks actually need it (gpu_cpu_sum > 0); and omit the CPU key from the Ray bundle dict when it is zero. Add a test for num_cpus=0 GPU-only and memory-only tasks.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
12 hours ago
fix(flotilla): emit unit GPU autoscaler bundles, not oversized shapes Carrying GPU tasks' CPU as ceil(gpu_cpu_sum / gpu_bundles) produced bundles like {CPU:2, GPU:1}. As a single Ray request_resources shape that fits no standard 1-CPU/1-GPU node, so the autoscaler can't scale up — and the value is recorded as the high-water mark, stalling further attempts. Emit unit {CPU:1, GPU:1} bundles instead (a sub-GPU task's cpu and gpu are each <= 1, so one always fits a standard GPU node), with the count covering both dimensions: ceil(max(gpu_sum, gpu_cpu_sum)). Two 1-CPU/0.5-GPU tasks now request two {CPU:1,GPU:1} shapes (2 CPU / 2 GPU) rather than one unschedulable {CPU:2}. Assert the schedulable shape in the regression test.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
14 hours ago
fix(flotilla): track post-aggregation request as autoscaler high-water mark The high-water mark recorded the fractional cpu_sum, but the request actually sent to Ray is the integer-aggregated bundle total. With min_cpu_per_task=0.1 the mark grew ~0.1 per cycle while ceil() only bumped the real CPU request every ~10 cycles, so scale-up for many pending tasks stalled for ~1/min_cpu_per_task cycles (≈50s at the default 5s interval) per extra CPU. Record the aggregated integer bundle totals (what Ray actually receives) as the mark instead. Because each cycle selects bundles until the fractional cpu_sum exceeds the integer mark, ceil() now bumps by at least one CPU every cycle, restoring the intended one-unit-per-cycle ramp while still never requesting less than before. Convergence is unchanged: once pending demand can no longer exceed the mark, the cycle is skipped. Verified: cargo test -p daft-distributed --lib (8 task tests pass), cargo check/clippy -p daft-distributed --features python clean.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
1 day ago
fix(flotilla): keep multi-CPU tasks as individual autoscaler bundles aggregate_ray_bundles packed every CPU-only task into unit {"CPU": 1} bundles. That is wrong for a task requesting num_cpus >= 1: a 4-CPU task runs on one worker, so splitting it into 4 spread bundles lets the autoscaler provision 4 single-CPU nodes and leaves the task unschedulable. It also turned CPU magnitude into the loop count, so a huge or non-finite explicit num_cpus (inf as i64 == i64::MAX) could hang/OOM, and a NaN poisoned the running sum and zeroed the batch's CPU request. Only pack sub-1.0 CPU-only tasks now; tasks with GPU, memory, or num_cpus >= 1 keep an individual bundle (CPU rounded up to at least 1). Non-finite / non-positive CPU contributes nothing. The packed sum is now bounded by task count, so the loop can no longer blow up.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
1 day ago
fix(flotilla): aggregate fractional CPU into integer Ray autoscaler bundles Ray's request_resources (<= 2.55; Daft pins ray==2.55.1) rejects non-integer bundle values: `isinstance(bundle[key], int)` raises `TypeError: each bundle key should be str and value as int.`. Sending a float CPU therefore crashes at runtime — even a whole 1.0, since pyo3 emits a Python float. Replace the per-task float bundle with `aggregate_ray_bundles`: CPU-only tasks have their fractional CPU summed and emitted as ceil(sum) unit {"CPU": 1} bundles, so N tasks at 0.1 CPU request ceil(0.1*N) CPUs instead of N (issue #7123), while never sending a non-integer value. Tasks carrying GPU or memory keep an individual bundle (CPU rounded up) since those resources pin placement to a node. Revert the try_autoscale type hint to dict[str, int]. Replace the fractional bundle unit tests with aggregation tests.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
1 day ago
fix(autoscale): use TaskResourceRequest wrappers so min_cpu_per_task fallback applies
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
1 day ago
fix(goosefs): forward retry/timeout/concurrency to OpenDAL and sparse multiline display Address review feedback on GoosefsConfig: - P1: to_opendal_config now forwards max_retries, retry_timeout_ms, connect_timeout_ms, read_timeout_ms, max_concurrent_requests and max_connections_per_io_thread into the returned config map (only when non-default), so user-provided values are no longer silently dropped at the Daft layer. - P2: multiline_display now gates every numeric/boolean field with an if value != default guard, mirroring how auth_username is handled. A default-constructed GoosefsConfig produces an empty multiline view, and IOConfig::multiline_display omits the GooseFS config = { ... } line entirely in that case. Adds regression tests covering both behaviours.
XuQianJin-Stars:feat/goosefs-support
2 days ago
Latest Branches
CodSpeed Performance Gauge
0%
feat(grouped-agg): shard AggThenPartition execution per morsel
#7060
11 hours ago
e26889d
BABTUNA:perf/sharded-grouped-agg
CodSpeed Performance Gauge
0%
fix(flotilla): wire min_cpu_per_task into TaskResourceRequest
#7125
13 hours ago
27f15c2
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
CodSpeed Performance Gauge
0%
feat(io): add GooseFS support via OpenDAL services-goosefs
#7109
11 days ago
434897f
XuQianJin-Stars:feat/goosefs-support
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs