Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
Merge branch 'main' into patch-1
ARDA7787:patch-1
6 hours ago
docs(paimon): clarify object-store IO config usage
jackylee-ch:codex-doc-paimon-io-config
14 hours ago
fix(filesystem): fix pyarrow fs memory by caching by value, not identity (#7025) ## Changes Made Fixes a memory leak in pyarrow fs. In long-running `write_iceberg` jobs this drained file descriptors and threads until the process OOM'd. This was triggered with a refresh-credentials S3 setup, but the cache is broken for every IOConfig. This PR keys the cache on `repr(io_config)` The audit found that `IOConfig.__hash__` returns equal values for semantically-equal configs, but `__eq__` is identity-based on the PyO3 wrapper. The dict-keyed cache at `daft/filesystem.py:35` therefore missed on **every** call when the Rust side handed a fresh Python wrapper to each writer, rebuilding a new PyArrow `S3FileSystem` (with its own thread pool and connection pool) per output file. | | FD slope / iter | RSS slope MiB / iter | |---|---|---| | Before fix | +63.9 | +2.15 | | After fix | **−0.05** | +0.39 | ## Related Issues - N/A
main
1 day ago
feat(checkpoint): add distributed observability counters Surface checkpoint progress on the dashboard for distributed (Flotilla) runs via worker->driver counter aggregation: - keys_staged on the StageCheckpointKeys source operator - files_staged and checkpoints_sealed on the write sink Each worker's RuntimeStats builds a StatSnapshot; the distributed pipeline node's handle_worker_node_stats sums the new fields into driver-side meter counters and re-exports them. Non-checkpoint operators report zero via default no-op RuntimeStats methods. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
rohit/feature/checkpoint-metrics
1 day ago
fix(filesystem): key pyarrow fs cache by IOConfig content, not identity IOConfig.__hash__ returns equal values for semantically-equal configs, but __eq__ is identity-based on the PyO3 wrapper. The dict-keyed cache in _resolve_paths_and_filesystem therefore missed on every call when the Rust side handed a fresh Python wrapper to each writer, rebuilding a new PyArrow S3FileSystem (with its own thread + connection pool) per output file. In long-running write_iceberg jobs this drained file descriptors and threads until the process OOM'd. Key on repr(io_config) instead — the cached entry's expiry field still drives refresh-credentials invalidation. In a 30-iteration MRE (16 partitions = 480 writers), FD growth drops from +63.9/iter to -0.05/iter and RSS slope drops 5x.
rchowell/write-leak-fix
1 day ago
docs: add shuffle algorithms tuning guide (#7017) Adds a user-facing page in the Optimization section covering Daft's four `shuffle_algorithm` options — `auto`, `map_reduce`, `pre_shuffle_merge`, and `flight_shuffle` — when each applies, and how to tune `flight_shuffle_dirs` and `flight_shuffle_compression`. Adds cross-links between the new page and `partitioning.md` so the partition-count → shuffle-cost path is followable in both directions. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main
1 day ago
feat: first_value / last_value aggs for window functions (#6974) Implements first_value(ignore_nulls) and last_value(ignore_nulls) as window-only aggregation functions Four constraints enforced at the logical plan level: 1. Window-only — rejected in .agg() / .groupby().agg() context (resolve_expr.rs) 2. partition_by required 3. order_by required 4. Frame (rows_between) required Core algorithm - add(start, end) — pushes qualifying indices to the back of the deque; skips nulls when ignore_nulls=True - remove(start, end) — pops from the front while front < end_idx; handles bounded frames and backward windows where the left boundary advances - evaluate() — snapshots front() (first_value) or back() (last_value) into result_idxs; empty deque or null value → null - build() — materialises via source.take(result_idxs) + with_nulls
main
1 day ago
cleaned up while statement
euan/window-firstlastval-agg
1 day ago
Latest Branches
CodSpeed Performance Gauge
0%
fix(local): replace unguarded unwrap() calls with recoverable error handling
#7003
6 hours ago
21dcb68
ARDA7787:patch-1
CodSpeed Performance Gauge
-1%
docs(paimon): clarify object-store IO config usage
#7029
14 hours ago
9b07d39
jackylee-ch:codex-doc-paimon-io-config
CodSpeed Performance Gauge
0%
feat(checkpoint): distributed observability counters
#7026
1 day ago
7a447e2
rohit/feature/checkpoint-metrics
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs