Eventual-Inc
Daft
BlogDocsChangelog

feat(ray): Implement dynamic scale-in for RaySwordfishActor

#5903
Comparing
huleilei:hll/auto
(
e8b7527
) with
main
(
29ffd49
)
CodSpeed Performance Gauge
0%
Untouched
24
Ignored
4

Benchmarks

Passed

test_tpch[1-in-memory-5]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
+2%
132.2 ms130.3 ms
test_count[100 Small Files]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
+1%
57.4 ms57.2 ms
test_tpch[1-in-memory-1]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
+1%
403.1 ms401 ms
test_tpch[1-in-memory-9]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
275.9 ms274.6 ms
test_explain[100 Small Files]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
0%
12.3 ms12.2 ms
test_tpch[1-in-memory-7]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
127.6 ms127.4 ms
test_tpch[1-in-memory-8]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
148.4 ms148.3 ms
test_tpch[1-in-memory-2]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
56.3 ms56.3 ms
test_tpch[1-in-memory-3]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
124 ms124 ms
test_tpch[1-in-memory-4]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
88 ms88 ms
test_tpch_sql[1-in-memory-6]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
29.2 ms29.2 ms
test_tpch[1-in-memory-6]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
28.5 ms28.5 ms
test_tpch_sql[1-in-memory-4]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
89 ms89 ms
test_tpch_sql[1-in-memory-5]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
117.8 ms117.9 ms
test_iter_rows_first_row[1 Small File]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
0%
35.7 ms35.8 ms
test_tpch_sql[1-in-memory-1]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
399.8 ms401.8 ms
test_tpch_sql[1-in-memory-9]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
264 ms265.5 ms
test_tpch_sql[1-in-memory-3]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
116.3 ms116.9 ms
test_tpch_sql[1-in-memory-7]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
117.7 ms118.5 ms
test_tpch_sql[1-in-memory-8]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
134.2 ms135.4 ms
test_tpch[1-in-memory-10]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
196.1 ms198.1 ms
test_tpch_sql[1-in-memory-2]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
165.5 ms167.7 ms
test_tpch_sql[1-in-memory-10]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
188.9 ms191.5 ms
test_show[1 Small File]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
-3%
11.9 ms12.3 ms

Ignored

test_explain[1 Small File]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
+1%
2.1 ms2.1 ms
test_iter_rows_first_row[100 Small Files]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
+3%
153.7 ms148.7 ms
test_count[1 Small File]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
-1%
3.5 ms3.6 ms
test_show[100 Small Files]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
+1%
23.2 ms22.9 ms

Commits

Click on a commit to change the comparison range
Base
main
29ffd49
-0.27%
feat(ray): Implement dynamic scale-in for RaySwordfishActor This commit implements the dynamic scaling down (scale-in) functionality for RaySwordfishActor to release idle resources. Key changes: - Implement `retire_idle_ray_workers` in `RayWorkerManager` to identify and release idle workers. - Add `pending_release_blacklist` to track retiring workers and prevent them from being reused or causing "worker died" errors. - Move scale-down cooldown logic to `RayWorkerManager` to prevent frequent scale-down operations. - Optimize `retire_idle_ray_workers` to reduce lock contention by releasing the lock before performing Ray/Python operations. - Update `try_autoscale` in `flotilla.py` to support empty resource requests, enabling Ray to scale down resources. - Fix unit tests in `src/daft-distributed/src/scheduling/worker.rs` and ensure compatibility with the scheduler loop. This addresses the issue where `udfActor` could not dynamically scale down and prevents "worker died" errors during graceful shutdown.
e8b7527
2 days ago
by huleilei
© 2025 CodSpeed Technology
Home Terms Privacy Docs