Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
Merge remote-tracking branch 'upstream/main' into codex-sql-read-parquet-ignore-corrupt-files
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
1 hour ago
Merge remote-tracking branch 'upstream/main' into codex-sql-read-parquet-ignore-corrupt-files
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
3 hours ago
Merge remote-tracking branch 'upstream/main' into codex-sql-read-parquet-ignore-corrupt-files
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
4 hours ago
chore(temporal): address Greptile review on unix extractors Sort the new from .datetime imports alphabetically to match the __all__ list. Add Expression method forms (unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, weekday) so the new functions are reachable via method-chaining like every other temporal extractor.
BABTUNA:feat/temporal-unix-extractors
6 hours ago
Merge remote-tracking branch 'upstream/main' into codex-sql-read-parquet-ignore-corrupt-files
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
6 hours ago
fix(scan): populate per-column sizes when splitting parquet by row groups Follow-up to #6542. The size estimator uses per-column uncompressed sizes from the Parquet footer (TableMetadata.column_sizes) when available, but those were only captured for the first file in a glob. split_by_row_groups already reads each file's full footer at planning time, so populate column_sizes there too — giving accurate, projection-aware estimates for every split task at zero extra I/O. This also fixes a latent over-estimate: previously a split of the first file reused the whole-file column_sizes and only updated `length`, so each split estimated the entire file (N splits => ~Nx the real size). We now recompute per-column sizes for exactly the row groups in each split, and attach an accurate row count for every file (not just the first). Adds a Ray regression test asserting the total estimated scan bytes tracks the files' uncompressed size, rather than being multiplied by the number of splits or blown up by parquet_inflation_factor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
varun/scan-split-column-sizes
6 hours ago
fix typo in src/daft-distributed/src/pipeline_node/shuffles/backends/mod.rs Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Liusixuuu:fix/into-partitions-flight-shuffle-panic
6 hours ago
Merge remote-tracking branch 'upstream/main' into codex-sql-read-parquet-ignore-corrupt-files
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
7 hours ago
Latest Branches
CodSpeed Performance Gauge
0%
feat(sql): support read_parquet ignore_corrupt_files
#7133
2 hours ago
49207d1
jackylee-ch:codex-sql-read-parquet-ignore-corrupt-files
CodSpeed Performance Gauge
0%
feat(temporal): add Spark-style unix extractors and weekday
#6920
1 month ago
30716e9
BABTUNA:feat/temporal-unix-extractors
CodSpeed Performance Gauge
0%
fix(scan): populate per-column sizes when splitting parquet by row groups
#7155
7 hours ago
3e3a3b9
varun/scan-split-column-sizes
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs