paradigmxyz
reth
BlogDocsChangelog

perf(trie): add batching for storage proof results [WIP]

#19792
Comparing
yk/proof-result-batching
(
a5afac0
) with
main
(
2ade18d
)
CodSpeed Performance Gauge
0%
Untouched
81

Benchmarks

Passed

remove_leaf[1000]
crates/trie/sparse/benches/update.rs::benches::remove_leaf
CodSpeed Performance Gauge
+7%
281.1 µs262.9 µs
hash builder[init size 10000 | update size 100 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
+1%
27.3 ms27 ms
sparse trie[1000]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves::calculate root from leaves
CodSpeed Performance Gauge
+1%
5.7 ms5.7 ms
remove_leaf[5000]
crates/trie/sparse/benches/update.rs::benches::remove_leaf
CodSpeed Performance Gauge
+1%
1.2 ms1.1 ms
prefix set | size: 10 | `BTreeSet` with `BTreeSet:range` lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
+1%
5.1 µs5 µs
size 100000 | updated 0.1% | depth 5
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
0%
1.1 ms1.1 ms
hash builder[init size 10000 | update size 1000 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
253.7 ms253.1 ms
sparse trie[init size 1000 | update size 100 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
1.1 ms1.1 ms
prefix set | size: 100 | `BTreeSet` with `BTreeSet:range` lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
40.5 µs40.4 µs
sparse trie[init size 10000 | update size 1000 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
33.9 ms33.8 ms
sparse trie[5000]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves::calculate root from leaves
CodSpeed Performance Gauge
0%
28.3 ms28.3 ms
hash builder[5000]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves::calculate root from leaves
CodSpeed Performance Gauge
0%
22.3 ms22.3 ms
size 100000 | updated 1% | depth 4
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
0%
11.4 ms11.3 ms
receipts root | size: 10 | HashBuilder
crates/trie/trie/benches/trie_root.rs::benches::trie_root_benchmark::Receipts root calculation
CodSpeed Performance Gauge
0%
112 µs111.9 µs
sparse trie[init size 1000 | update size 100 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
3.3 ms3.3 ms
sparse trie[init size 10000 | update size 1000 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
11.7 ms11.6 ms
sparse trie[init size 1000 | update size 1000 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
5.1 ms5.1 ms
sparse trie[init size 1000 | update size 100 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
5.4 ms5.4 ms
hash builder[init size 10000 | update size 100 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
9.6 ms9.6 ms
parallel hashing[100]
crates/trie/trie/benches/hash_post_state.rs::post_state::hash_post_state::Hash Post State
CodSpeed Performance Gauge
0%
259 ms258.7 ms
sequence hashing[100]
crates/trie/trie/benches/hash_post_state.rs::post_state::hash_post_state::Hash Post State
CodSpeed Performance Gauge
0%
258.9 ms258.7 ms
sparse trie[init size 1000 | update size 1000 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
24.9 ms24.9 ms
validate_blob | num blobs: 1 | ValidateBlob
crates/primitives/benches/validate_blob_tx.rs::validate_blob::blob_validation::Blob Transaction KZG validation
CodSpeed Performance Gauge
0%
149.8 µs149.8 µs
sparse trie[init size 10000 | update size 100 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
11.1 ms11.1 ms
prefix set | size: 100 | `BTreeSet` with `Iterator:any` lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
149.3 µs149.3 µs
prefix set | size: 1000 | `Vec` with custom cursor lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
146.2 µs146.1 µs
receipts root | size: 1000 | HashBuilder
crates/trie/trie/benches/trie_root.rs::benches::trie_root_benchmark::Receipts root calculation
CodSpeed Performance Gauge
0%
9.1 ms9.1 ms
sparse trie[init size 10000 | update size 1000 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
56 ms56 ms
validate_blob | num blobs: 3 | ValidateBlob
crates/primitives/benches/validate_blob_tx.rs::validate_blob::blob_validation::Blob Transaction KZG validation
CodSpeed Performance Gauge
0%
150.9 µs150.8 µs
sparse trie[init size 10000 | update size 1000 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
111.4 ms111.4 ms
prefix set | size: 1000 | `BTreeSet` with `Iterator:any` lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
14.9 ms14.9 ms
sparse trie[init size 10000 | update size 100 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
6.7 ms6.7 ms
hash builder[init size 10000 | update size 100 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
44 ms44 ms
recover ECDSA
crates/primitives/benches/recover_ecdsa_crit.rs::benches::criterion_benchmark
CodSpeed Performance Gauge
0%
206.8 µs206.8 µs
prefix set | size: 1000 | `Vec` with binary search lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
205.7 µs205.6 µs
sparse trie[init size 10000 | update size 100 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
2.3 ms2.3 ms
validate_blob | num blobs: 4 | ValidateBlob
crates/primitives/benches/validate_blob_tx.rs::validate_blob::blob_validation::Blob Transaction KZG validation
CodSpeed Performance Gauge
0%
151.9 µs151.9 µs
hash builder[init size 1000 | update size 1000 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
57.2 ms57.2 ms
sequence hashing[1000]
crates/trie/trie/benches/hash_post_state.rs::post_state::hash_post_state::Hash Post State
CodSpeed Performance Gauge
0%
2.6 s2.6 s
prefix set | size: 1000 | `BTreeSet` with `BTreeSet:range` lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
473 µs473 µs
prefix set | size: 100 | `Vec` with custom cursor lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
16.7 µs16.7 µs
sparse trie[init size 1000 | update size 100 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
10.8 ms10.8 ms
validate_blob | num blobs: 5 | ValidateBlob
crates/primitives/benches/validate_blob_tx.rs::validate_blob::blob_validation::Blob Transaction KZG validation
CodSpeed Performance Gauge
0%
153.5 µs153.5 µs
hash builder[init size 10000 | update size 1000 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
26.3 ms26.3 ms
update_leaf[5000]
crates/trie/sparse/benches/update.rs::benches::update_leaf
CodSpeed Performance Gauge
0%
178.7 µs178.7 µs
ordered_trie_root
crates/trie/trie/benches/trie_root.rs::benches::trie_root_benchmark::Receipts root calculation::receipts root | size: 1000 | triehash
CodSpeed Performance Gauge
0%
11.8 ms11.8 ms
ordered_trie_root
crates/trie/trie/benches/trie_root.rs::benches::trie_root_benchmark::Receipts root calculation::receipts root | size: 10 | triehash
CodSpeed Performance Gauge
0%
131.9 µs131.9 µs
sparse trie[init size 10000 | update size 100 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
22 ms22.1 ms
validate_blob | num blobs: 6 | ValidateBlob
crates/primitives/benches/validate_blob_tx.rs::validate_blob::blob_validation::Blob Transaction KZG validation
CodSpeed Performance Gauge
0%
154.9 µs155 µs
ordered_trie_root
crates/trie/trie/benches/trie_root.rs::benches::trie_root_benchmark::Receipts root calculation::receipts root | size: 100 | triehash
CodSpeed Performance Gauge
0%
1.2 ms1.2 ms
parallel hashing[1000]
crates/trie/trie/benches/hash_post_state.rs::post_state::hash_post_state::Hash Post State
CodSpeed Performance Gauge
0%
2.6 s2.6 s
sparse trie[init size 1000 | update size 1000 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
15 ms15 ms
receipts root | size: 100 | HashBuilder
crates/trie/trie/benches/trie_root.rs::benches::trie_root_benchmark::Receipts root calculation
CodSpeed Performance Gauge
0%
937.4 µs937.9 µs
hash builder[1000]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves::calculate root from leaves
CodSpeed Performance Gauge
0%
4.5 ms4.5 ms
hash builder[init size 1000 | update size 1000 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
17.3 ms17.3 ms
sparse trie[init size 1000 | update size 1000 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
49.6 ms49.6 ms
hash builder[init size 1000 | update size 1000 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
28.7 ms28.7 ms
hash builder[init size 10000 | update size 1000 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
127.3 ms127.5 ms
hash builder[init size 10000 | update size 100 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
85.9 ms86 ms
hash builder[init size 1000 | update size 100 | num updates 10]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
24.6 ms24.7 ms
prefix set | size: 100 | `Vec` with binary search lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
0%
19.4 µs19.4 µs
validate_blob | num blobs: 2 | ValidateBlob
crates/primitives/benches/validate_blob_tx.rs::validate_blob::blob_validation::Blob Transaction KZG validation
CodSpeed Performance Gauge
0%
149.6 µs149.9 µs
size 100000 | updated 0.1% | depth 4
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
0%
1.4 ms1.4 ms
hash builder[init size 1000 | update size 100 | num updates 5]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
12.4 ms12.4 ms
hash builder[init size 10000 | update size 1000 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
77.3 ms77.5 ms
hash builder[init size 1000 | update size 1000 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
0%
5.8 ms5.8 ms
size 100000 | updated 0.1% | depth 3
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
0%
2.5 ms2.5 ms
size 100000 | updated 1% | depth 2
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
24.5 ms24.7 ms
size 100000 | updated 1% | depth 0
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
24.7 ms24.8 ms
size 100000 | updated 1% | depth 1
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
24.7 ms24.9 ms
prefix set | size: 10 | `Vec` with binary search lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
-1%
3.8 µs3.9 µs
size 100000 | updated 1% | depth 3
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
21.7 ms21.9 ms
prefix set | size: 10 | `Vec` with custom cursor lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
-1%
3.2 µs3.3 µs
size 100000 | updated 0.1% | depth 1
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
3.6 ms3.6 ms
hash builder[init size 1000 | update size 100 | num updates 3]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
-1%
7.5 ms7.5 ms
hash builder[init size 1000 | update size 100 | num updates 1]
crates/trie/sparse/benches/root.rs::root::calculate_root_from_leaves_repeated::calculate root from leaves repeated
CodSpeed Performance Gauge
-1%
2.5 ms2.5 ms
size 100000 | updated 0.1% | depth 0
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
3.6 ms3.6 ms
update_leaf[1000]
crates/trie/sparse/benches/update.rs::benches::update_leaf
CodSpeed Performance Gauge
-1%
112.2 µs113.4 µs
size 100000 | updated 0.1% | depth 2
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
3.4 ms3.4 ms
size 100000 | updated 1% | depth 5
crates/trie/sparse/benches/rlp_node.rs::rlp_node::update_rlp_node_level::update rlp node level
CodSpeed Performance Gauge
-1%
8.2 ms8.3 ms
prefix set | size: 10 | `BTreeSet` with `Iterator:any` lookup
crates/trie/common/benches/prefix_set.rs::prefix_set::prefix_set_lookups::Prefix Set Lookups
CodSpeed Performance Gauge
-2%
3.8 µs3.9 µs

Commits

Click on a commit to change the comparison range
Base
main
2ade18d
+0.13%
perf(trie): add adaptive batching for storage proof results Problem: Storage proof workers send one ProofResultMessage per proof through crossbeam channels. For blocks with many small storage changes (100+ accounts), this creates 100+ individual send/recv syscalls, adding significant overhead. Solution: Implement adaptive batching at the worker level that collects multiple storage proof jobs based on queue pressure and processes them together. Changes: - Add batching constants (MAX_BATCH_SIZE: 32, MIN_QUEUE_FOR_BATCHING: 2) - Add BatchedProofResults type for batched proof containers - Implement try_collect_batch_static() for adaptive batch collection - Modify StorageProofWorker::run() to use batching when beneficial - Add process_storage_proof_batch() for batch processing - Preserve individual channels to maintain existing architecture Batching Strategy: - Queue depth < 2: Process individually (minimize latency) - Queue depth 2-32: Batch = queue depth (balanced) - Queue depth > 32: Batch = 32 (maximize throughput) - Blinded node requests: Never batch (latency-sensitive) Expected Impact: - 70%+ reduction in channel syscalls under high load - No latency regression for low-load scenarios - Better CPU cache utilization through sequential processing Baseline: 100 storage proofs = 200 syscalls (100 sends + 100 recvs) With batching (avg batch size 4-8): ~30-50 syscalls = 75-85% reduction
f6f41c9
23 hours ago
by yongkangc
+0.12%
fix(trie): address code review issues in proof batching Fixed critical bugs and issues identified in code review: 1. **Critical: Fix job loss bug for BlindedStorageNode requests** - Previously, blinded node jobs were silently dropped during batch collection, causing indefinite hangs - Now properly includes all job types in batch and separates them for appropriate processing 2. **Remove unused BatchedProofResults struct** - Struct was defined but never instantiated or used - Removed to eliminate dead code and confusion 3. **Fix misleading documentation** - Clarified that batching optimizes job *processing*, not result sending - Updated docs to accurately reflect performance benefits: * Reduced recv() syscalls on work queue * Better CPU cache locality * Reduced context switching overhead - Removed unsubstantiated claims about channel syscall reduction 4. **Improve job handling** - Worker loop now properly separates storage proofs from blinded nodes - Storage proofs batched for cache benefits when multiple available - Blinded node requests always processed individually (latency-sensitive) 5. **Code quality improvements** - Added accurate inline documentation - Fixed move-after-use error with job type checking - Ensured all jobs are processed, never lost Changes address all issues raised in PR review including: - Job loss bug (P0) - Dead code (P1) - Misleading performance claims (P1) - Proper mixed job type handling (P1) All clippy checks pass with -D warnings.
882bea7
16 hours ago
by yongkangc
-0.05%
refactor: simplify proof result batching implementation Simplifies the batching implementation based on code review feedback: - Remove unnecessary performance comments from documentation - Streamline function documentation to focus on behavior - Fix unused worker_id parameter warning - Keep code clear and concise All fmt and clippy checks pass.
46c4bc2
15 hours ago
by yongkangc
-0.24%
fix(trie): address code review issues in proof batching **Problem:** Code review identified several issues with the batching implementation: - Redundant pattern matching (jobs classified by type twice) - Unnecessary process_storage_proof_batch wrapper function - Unused worker_id parameter in try_collect_batch_static **Solution:** Simplify the implementation by removing redundancy: - Destructure jobs immediately when separating by type - Remove process_storage_proof_batch function (just inline the loop) - Remove unused worker_id parameter **Changes:** - Worker loop now destructures jobs in first match, eliminating redundant pattern checks - Removed 23 lines of dead/redundant code - Cleaner, more direct implementation **Expected Impact:** No functional changes, just cleaner code that's easier to understand and maintain.
a5afac0
12 hours ago
by yongkangc
© 2025 CodSpeed Technology
Home Terms Privacy Docs