Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
fix(ai): handle device_map and return_full_text in transformers prompter
YuangGao:feat/transformers-prompter
5 minutes ago
refactor(inline-agg): consolidate Sum/Product/Min/Max into one macro
BABTUNA:refactor/inline-agg-generic-accums
47 minutes ago
fix(scan): use parquet metadata for scan task size estimates (#6542) During schema inference, `GlobScanOperator::try_new` already reads the full parquet footer via `read_parquet_schema_and_metadata` but only preserves `num_rows` in `TableMetadata` - row group size information is discarded. When `estimate_in_memory_size_bytes` later needs a size estimate and no column-level `TableStatistics` are available, it falls back to `approx_num_rows * schema.estimate_row_size_bytes()`, which uses a fixed 20 bytes for Utf8 columns. For data with dictionary encoding or low-cardinality columns, this produces wildly inflated estimates. To fix this, we extend `TableMetadata` with an optional `size_bytes` field populated from the sum of uncompressed (`total_byte_size`) row group sizes during schema inference, and use it as a middle-tier fallback in `estimate_in_memory_size_bytes` between table statistics (most accurate) and the schema-based guess (least accurate). For a 7.4 MB parquet file with 40M rows of 7 repeated URLs (from the Rivian repro at `s3://cgrinstead/daft-rivian-repro/data.parquet`), the estimate drops from **1.66 GiB to 44 MiB**. ## Changes - Expose `total_byte_size()` on `DaftRowGroupMetaData` (wraps arrow-rs `RowGroupMetaData::total_byte_size()`) - Add `size_bytes: Option<usize>` to `TableMetadata` with `#[serde(default)]` for backwards compatibility - Populate `size_bytes` from parquet row group metadata during schema inference in `GlobScanOperator::try_new` - Aggregate `size_bytes` across sources in `ScanTask::new` - Add `metadata.size_bytes` as a fallback tier in `estimate_in_memory_size_bytes` --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Varun Madan <varun@Varuns-MacBook-Pro.local> Co-authored-by: Varun Madan <varun.madan@gmail.com> Co-authored-by: Varun Madan <varun@Mac.localdomain>
main
4 hours ago
fix(scan): clamp negative parquet uncompressed_size to zero Greptile review: ColumnChunkMetaData::uncompressed_size() is an i64; a malformed file with a negative value would wrap to a huge u64 and re-inflate the estimate. Clamp to 0 defensively. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
desmond/preserve-parquet-stats-from-schema-inference
4 hours ago
fix(session): list_namespaces mirrors list_tables resolution (#7144) ## Changes Made `Session.list_namespaces` ignored attached catalogs and raised when no current catalog was set. It now mirrors `Session::list_tables` (#7126): - Rule 3: `<catalog>.<rest>` patterns dispatch exclusively to that attached catalog - Otherwise: namespaces from the current catalog, qualified with the session alias so results round-trip through `set_namespace` / `list_tables` / `read_table` - No current catalog returns `[]` instead of raising `ValueError` Adds `Session::list_namespaces` in Rust + `PySession` shim and switches the Python wrapper to delegate. ## Related Issues Closes #7134
main
5 hours ago
Merge with main
everettVT/hf-storage-buckets
6 hours ago
Merge branch 'main' into feat/width-bucket
YuangGao:feat/width-bucket
6 hours ago
Merge branch 'main' into fix/list-namespaces-rule3-dispatch
YuangGao:fix/list-namespaces-rule3-dispatch
6 hours ago
Latest Branches
CodSpeed Performance Gauge
0%
feat(ai): add transformers provider for prompt()
#7152
57 minutes ago
7f39376
YuangGao:feat/transformers-prompter
CodSpeed Performance Gauge
0%
refactor(inline-agg): consolidate Sum/Product/Min/Max into one macro
#7151
2 hours ago
de6c121
BABTUNA:refactor/inline-agg-generic-accums
CodSpeed Performance Gauge
0%
fix(scan): use parquet metadata for scan task size estimates
#6542
5 hours ago
4507ba9
desmond/preserve-parquet-stats-from-schema-inference
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs