Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
Merge remote-tracking branch 'upstream/main' into codex-sql-read-parquet-ignore-corrupt-files
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
4 hours ago
fix type hint and style.
refactor--embed_text-public-api-to-delegate-expression-building-to-providers
10 hours ago
perf(grouped-agg): bump NUM_SHARDS_PER_MORSEL to 8 (empirically tuned)
BABTUNA:perf/sharded-grouped-agg
23 hours ago
fix(flotilla): honor explicit num_cpus=0 in autoscaler bundles aggregate_ray_bundles forced CPU >= 1 (via .max(1) on individual bundles and a fixed CPU:1 on every GPU bundle), so a task that explicitly sets num_cpus=0 still requested a CPU — breaking "explicit num_cpus passes through unchanged" and over-requesting CPU for GPU-only / memory-only workloads. Drop the .max(1); give GPU bundles CPU only when the packed tasks actually need it (gpu_cpu_sum > 0); and omit the CPU key from the Ray bundle dict when it is zero. Add a test for num_cpus=0 GPU-only and memory-only tasks.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
1 day ago
fix(flotilla): emit unit GPU autoscaler bundles, not oversized shapes Carrying GPU tasks' CPU as ceil(gpu_cpu_sum / gpu_bundles) produced bundles like {CPU:2, GPU:1}. As a single Ray request_resources shape that fits no standard 1-CPU/1-GPU node, so the autoscaler can't scale up — and the value is recorded as the high-water mark, stalling further attempts. Emit unit {CPU:1, GPU:1} bundles instead (a sub-GPU task's cpu and gpu are each <= 1, so one always fits a standard GPU node), with the count covering both dimensions: ceil(max(gpu_sum, gpu_cpu_sum)). Two 1-CPU/0.5-GPU tasks now request two {CPU:1,GPU:1} shapes (2 CPU / 2 GPU) rather than one unschedulable {CPU:2}. Assert the schedulable shape in the regression test.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
1 day ago
fix(flotilla): track post-aggregation request as autoscaler high-water mark The high-water mark recorded the fractional cpu_sum, but the request actually sent to Ray is the integer-aggregated bundle total. With min_cpu_per_task=0.1 the mark grew ~0.1 per cycle while ceil() only bumped the real CPU request every ~10 cycles, so scale-up for many pending tasks stalled for ~1/min_cpu_per_task cycles (≈50s at the default 5s interval) per extra CPU. Record the aggregated integer bundle totals (what Ray actually receives) as the mark instead. Because each cycle selects bundles until the fractional cpu_sum exceeds the integer mark, ceil() now bumps by at least one CPU every cycle, restoring the intended one-unit-per-cycle ramp while still never requesting less than before. Convergence is unchanged: once pending demand can no longer exceed the mark, the cycle is skipped. Verified: cargo test -p daft-distributed --lib (8 task tests pass), cargo check/clippy -p daft-distributed --features python clean.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
2 days ago
fix(flotilla): keep multi-CPU tasks as individual autoscaler bundles aggregate_ray_bundles packed every CPU-only task into unit {"CPU": 1} bundles. That is wrong for a task requesting num_cpus >= 1: a 4-CPU task runs on one worker, so splitting it into 4 spread bundles lets the autoscaler provision 4 single-CPU nodes and leaves the task unschedulable. It also turned CPU magnitude into the loop count, so a huge or non-finite explicit num_cpus (inf as i64 == i64::MAX) could hang/OOM, and a NaN poisoned the running sum and zeroed the batch's CPU request. Only pack sub-1.0 CPU-only tasks now; tasks with GPU, memory, or num_cpus >= 1 keep an individual bundle (CPU rounded up to at least 1). Non-finite / non-positive CPU contributes nothing. The packed sum is now bounded by task count, so the loop can no longer blow up.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
2 days ago
fix(flotilla): aggregate fractional CPU into integer Ray autoscaler bundles Ray's request_resources (<= 2.55; Daft pins ray==2.55.1) rejects non-integer bundle values: `isinstance(bundle[key], int)` raises `TypeError: each bundle key should be str and value as int.`. Sending a float CPU therefore crashes at runtime — even a whole 1.0, since pyo3 emits a Python float. Replace the per-task float bundle with `aggregate_ray_bundles`: CPU-only tasks have their fractional CPU summed and emitted as ceil(sum) unit {"CPU": 1} bundles, so N tasks at 0.1 CPU request ceil(0.1*N) CPUs instead of N (issue #7123), while never sending a non-integer value. Tasks carrying GPU or memory keep an individual bundle (CPU rounded up) since those resources pin placement to a node. Revert the try_autoscale type hint to dict[str, int]. Replace the fractional bundle unit tests with aggregation tests.
XiaoHongbo-Hope:fix/min-cpu-per-task-wiring
2 days ago
Latest Branches
CodSpeed Performance Gauge
0%
feat(sql): support read_parquet ignore_corrupt_files
#7133
4 days ago
c19a075
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
CodSpeed Performance Gauge
0%
refactor(ai): delegate embed_text expression building to providers
#6026
5 months ago
1749308
refactor--embed_text-public-api-to-delegate-expression-building-to-providers
CodSpeed Performance Gauge
0%
feat(grouped-agg): shard AggThenPartition execution per morsel
#7060
24 hours ago
e26889d
BABTUNA:perf/sharded-grouped-agg
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs