Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

fix(ai): handle device_map and return_full_text in transformers prompter
YuangGao:feat/transformers-prompter
5 minutes ago
refactor(inline-agg): consolidate Sum/Product/Min/Max into one macro
BABTUNA:refactor/inline-agg-generic-accums
47 minutes ago
fix(scan): use parquet metadata for scan task size estimates (#6542) During schema inference, `GlobScanOperator::try_new` already reads the full parquet footer via `read_parquet_schema_and_metadata` but only preserves `num_rows` in `TableMetadata` - row group size information is discarded. When `estimate_in_memory_size_bytes` later needs a size estimate and no column-level `TableStatistics` are available, it falls back to `approx_num_rows * schema.estimate_row_size_bytes()`, which uses a fixed 20 bytes for Utf8 columns. For data with dictionary encoding or low-cardinality columns, this produces wildly inflated estimates. To fix this, we extend `TableMetadata` with an optional `size_bytes` field populated from the sum of uncompressed (`total_byte_size`) row group sizes during schema inference, and use it as a middle-tier fallback in `estimate_in_memory_size_bytes` between table statistics (most accurate) and the schema-based guess (least accurate). For a 7.4 MB parquet file with 40M rows of 7 repeated URLs (from the Rivian repro at `s3://cgrinstead/daft-rivian-repro/data.parquet`), the estimate drops from **1.66 GiB to 44 MiB**. ## Changes - Expose `total_byte_size()` on `DaftRowGroupMetaData` (wraps arrow-rs `RowGroupMetaData::total_byte_size()`) - Add `size_bytes: Option<usize>` to `TableMetadata` with `#[serde(default)]` for backwards compatibility - Populate `size_bytes` from parquet row group metadata during schema inference in `GlobScanOperator::try_new` - Aggregate `size_bytes` across sources in `ScanTask::new` - Add `metadata.size_bytes` as a fallback tier in `estimate_in_memory_size_bytes` --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Varun Madan <varun@Varuns-MacBook-Pro.local> Co-authored-by: Varun Madan <varun.madan@gmail.com> Co-authored-by: Varun Madan <varun@Mac.localdomain>
main
4 hours ago
Merge with main
everettVT/hf-storage-buckets
6 hours ago
Merge branch 'main' into feat/width-bucket
YuangGao:feat/width-bucket
6 hours ago
Merge branch 'main' into fix/list-namespaces-rule3-dispatch
YuangGao:fix/list-namespaces-rule3-dispatch
6 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
feat(ai): add transformers provider for prompt()#7152
57 minutes ago
7f39376
YuangGao:feat/transformers-prompter
CodSpeed Performance Gauge
0%
2 hours ago
de6c121
BABTUNA:refactor/inline-agg-generic-accums
CodSpeed Performance Gauge
0%
fix(scan): use parquet metadata for scan task size estimates#6542
5 hours ago
4507ba9
desmond/preserve-parquet-stats-from-schema-inference
© 2026 CodSpeed Technology
Home Terms Privacy Docs