Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
Skip short-circuit UDF tests on Ray runner
Lucas61000:issue-4069
17 minutes ago
make Function callable
gavin9402:support_catalog_function
57 minutes ago
Merge branch 'main' into issue-4069
Lucas61000:issue-4069
1 hour ago
make Function callable
gavin9402:support_catalog_function
2 hours ago
perf(parquet): yield individual batches in streaming path to fix morsel cache locality (#6558) The arrow-rs parquet reader's streaming path (`local_parquet_stream_arrowrs`) called `decode_single_rg` per row group, which set `with_batch_size(128K)` on the reader -- correctly producing multiple ~128K batches -- but then immediately concatenated them back into a single ~500K batch via `concat_or_empty`. This sent one large batch per row group through the pipeline, defeating the morsel size that downstream operators depend on for cache locality. On AMD EPYC (512KB L2 per core), evaluating many aggregation expressions on the same column causes the 128K-row source buffer (256KB) to stay hot in L2 across all evaluations. At 500K rows, the cast output (4MB) evicts the source from cache, forcing each expression to re-fetch from L3/memory. The fix extracts reader setup into `build_rg_reader` and has the streaming path iterate over individual batches, sending each ~128K batch through the channel. The bulk read path (`decode_single_rg`) is unchanged. Benchmarked on c6a.4xlarge (16 vCPUs, AMD EPYC) with 100 ClickBench parquet files (~100M rows total, Int16 column, 90 SUM aggregations): | Workload | v0.7.4 | v0.7.5 (regressed) | After fix | |----------|--------|--------------------|-----------| | 90 SUMs | 0.576s | 1.226s | **0.652s** | | 90 SUM(col+i) | 1.084s | 4.643s | **1.714s** | Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
main
2 hours ago
Update
Lucas61000:issue-4069
2 hours ago
perf(parquet): yield individual batches in streaming path to preserve morsel-level cache locality The arrow-rs parquet reader's streaming path (`local_parquet_stream_arrowrs`) called `decode_single_rg` per row group, which set `with_batch_size(128K)` on the reader - correctly producing multiple ~128K batches - but then immediately concatenated them back into a single ~500K batch via `concat_or_empty`. This sent one large batch per row group through the pipeline, defeating the morsel size that downstream operators depend on for cache locality. On AMD EPYC (512KB L2 per core), evaluating 90 aggregation expressions on the same column causes the 128K-row source buffer (256KB) to stay hot in L2 across all evaluations. At 500K rows, the cast output (4MB) evicts the source, forcing each expression to re-fetch from L3/memory. The fix extracts reader setup into `build_rg_reader` and has the streaming path iterate over individual batches, sending each ~128K batch through the channel. The bulk read path (`decode_single_rg`) is unchanged. Benchmarked on c6a.4xlarge (16 vCPUs, AMD EPYC) with 100 ClickBench parquet files (~100M rows total, Int16 column, 90 SUM aggregations): | Workload | v0.7.4 | v0.7.5 | After fix | |----------------|---------|---------|-----------| | 90 SUMs | 0.576s | 1.226s | 0.652s | | 90 SUM(col+i) | 1.084s | 4.643s | 1.714s | Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
desmond/fix-parquet-morsel-size
3 hours ago
fix test_logical_unary test case
sankarreddy-atlan:handle_missing_columns
3 hours ago
Latest Branches
CodSpeed Performance Gauge
+1%
fix: short-circuit evaluation for coalesce
#6525
46 minutes ago
9471826
Lucas61000:issue-4069
CodSpeed Performance Gauge
+1%
feat: support get function from catalog
#6524
2 hours ago
f7a77a6
gavin9402:support_catalog_function
CodSpeed Performance Gauge
+1%
perf(parquet): yield individual batches in streaming path to fix morsel cache locality
#6558
11 hours ago
a8e58ae
desmond/fix-parquet-morsel-size
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs