Commits
Click on a commit to change the comparison rangeperf(provider): batch hashed state writes across blocks
Batches write_hashed_state calls across all blocks in save_blocks to
improve I/O sequentiality.
Previously, each block's hashed state was written separately, causing
random I/O as keys from different blocks interleave in the B-tree.
For example, block N+1's hashed addresses might sort before block N's,
causing the cursor to jump backwards.
This change collects all blocks' hashed state into a single sorted
collection using extend_ref before writing, enabling sequential I/O.
This follows the same pattern already used for transaction hash writes.
Expected impact: Reduced page faults in the Persistence thread where
profiling showed 21% of time in pagecache_get_page/xas_load and 37.5%
in page fault handling. perf(provider): use k-way merge for batched hashed state writes
Implements a k-way merge algorithm for writing hashed state from multiple
blocks in save_blocks. This replaces the previous extend_ref approach
which cloned data into a combined vec.
Key improvements:
- No cloning: uses Arc references to per-block hashed state
- Deduplication: same key across blocks is written only once (last wins)
- Sequential I/O: writes in globally sorted key order across all blocks
- Streaming: uses BinaryHeap for O(n log k) merge without materializing
For hashed accounts:
- Merges sorted account lists from all blocks
- Deduplicates by hashed address, taking latest block's value
- Streams directly to MDBX cursor
For hashed storage:
- Collects all (address, block_idx, storage) tuples
- Sorts by address, then groups for per-address merge
- Handles storage wipes correctly (latest wipe invalidates earlier)
- K-way merge within each address's storage slots
This reduces both allocation overhead and redundant database writes,
addressing the 21% pagecache_get_page and 37.5% page fault overhead
observed in Persistence thread profiling. perf(provider): batch plain state writes in save_blocks
Add write_state_changes_merged to batch PlainAccountState, PlainStorageState,
and Bytecodes writes across multiple blocks using flatten-sort-dedupe approach.
This complements the existing write_hashed_state_merged optimization by applying
the same sequential I/O pattern to plain state tables:
- Accounts: flattened, sorted by address, latest block wins on duplicates
- Bytecodes: flattened, sorted by hash, latest block wins
- Storage: handles wipe_storage flag (latest wipe invalidates earlier blocks),
then flattens slots, sorts by (address, slot), writes only latest value
Also extracts write_receipts from write_state so receipts can be written
per-block while state changes are batched.
Expected impact: reduced I/O contention and improved write locality during
block persistence.
Amp-Thread-ID: https://ampcode.com/threads/T-019bc2f3-a568-73fe-8a6f-347d6bc9f386
Co-authored-by: Amp <amp@ampcode.com> Merge branch 'main' into joshie/batch-hashed-writes perf(persistence): use parallel sort for batched state writes
Replaces k-way merge with flatten + parallel sort (rayon) for
write_state_changes_merged. Benchmarks show this approach has lower
overhead for typical block counts (10-100 blocks) due to:
- Better cache locality from sequential memory access
- Rayon's parallel sort scales with CPU cores
- Avoids BinaryHeap per-element overhead
The k-way merge approach was ~2x slower than flatten+sort for storage
writes due to heap operation overhead outweighing the theoretical
O(n log k) vs O(n log n) advantage.
Also adds microbenchmarks in benches/state_merge.rs to validate
performance of different merge strategies. bench(state_merge): use realistic mainnet data sizes
Updated benchmark parameters to simulate real mainnet activity:
- Accounts: 500-1000 per block (vs 100 before)
- Storage: 50-100 contracts × 20-30 slots per block
Results show parallel sort provides significant speedups:
- Account merge: ~12% faster at 100k accounts
- Storage merge: ~32-37% faster at 50k-300k slots
This validates the switch from k-way merge to parallel sort. bench(state_merge): increase data sizes for heavy mainnet simulation
Accounts: 2000-5000 per block (up to 500k total)
Storage: 100-200 contracts × 30-50 slots per block (up to 1M slots)
Results with heavy workloads show significant parallel sort wins:
- 300k slots: 26ms seq → 17ms par (35% faster)
- 500k slots: 45ms seq → 28ms par (38% faster)
- 1M slots: 124ms seq → 87ms par (30% faster) refactor(provider): use k-way merge only for hashed state, revert plain state
Benchmarks showed the parallel sort for plain state caused regression
on small batches (1-10 blocks). This change:
- Reverts plain state to per-block `write_state` (original behavior)
- Keeps k-way merge for hashed state only (efficient since pre-sorted)
- Adds conditional: single block uses sequential write, multiple blocks
use k-way merge
Benchmark results show 2-10% improvement across all batch sizes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Merge branch 'main' into joshie/batch-hashed-writes