Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
refactor(parquet): single-source RG pruning for both local and remote Previously RG-level pruning was split across two functions with asymmetric coverage: - `prepare_remote_chunk_source` did positional pruning (start_offset, num_rows) for remote only. - `prune_row_groups` did user-list + predicate-stat pruning for both. Local paths therefore left RGs entirely before `start_offset` in `rg_indices` carrying all-skip `RowSelection`s, which still spawned per-RG decode tasks. `apply_limit_to_selection` and `count_active_rgs` existed only to clean up those leftover all-skip RGs after the fact. Move all RG-level filtering into `prune_row_groups`: user indices → positional (start_offset, num_rows) → predicate stats, in one pass. Both local and remote now drop fully-skippable RGs upfront. Follow-on simplifications: - `prepare_remote_chunk_source` no longer returns `override_rgs` (always None) — drop it from the chain. - `build_offset_selection` drops the entire-RG-skip arm. - `apply_limit_to_selection` drops the remaining=0 (all-skip) arm. - `count_active_rgs` deleted; `build_rg_inputs` loses its `no_pred_active` parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/parquet-perf
21 minutes ago
refactor(flotilla): drop flotilla_output_target_bytes config, hardcode 64 MiB The coalescing target wasn't worth a tunable surface — every reasonable deployment wants the head-node OOM protection. Hardcode to 64 MiB and delete the Rust field, env var, Python kwarg, and getter. Skip tests/io/test_jsonl_chunking.py on Ray, since coalescing now always merges its sub-MB chunks into one MicroPartition and masks the chunk_size→partition-layout signal these tests assert on. The native runner remains the source of truth for that behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/flotilla-coalesce-outputs
53 minutes ago
fix: embed shuffle hints in plan text instead of mid-call warnings `repr_ascii` fired `warnings.warn` while building the plan string, writing to stderr between `print("== Physical Plan ==")` and `print(plan_body)` — the warning visually landed inside the section. The `tracing::warn!` in `record_hint` also bridged through `pyo3_log` to Python logging, so the same hint emitted twice. Drop the `tracing::warn!`. Remove `surface_hints` from display methods (`repr_ascii`, `repr_mermaid`, `repr_json`, `num_partitions`); `repr_ascii` now appends a `Hints:` section to its returned string so hints ride through stdout via the caller. `run_plan` still fires `UserWarning` at actual execution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
worktree-rippling-riding-pudding
1 hour ago
feat: data type bug fix +
euan/nearest-asof
2 hours ago
fix: hash(-0.0) == hash(0.0) (#6963)
main
2 hours ago
refactor(parquet): extract shared arrow helpers; rename process_rg_streaming Pull three repeated patterns out of mod.rs and rg_processor.rs into util.rs: `record_batch_from_arrow` (try_new + Snafu context + Daft RecordBatch conversion), `eval_predicate_mask` (eval + bool + as_arrow), `filter_arrays_by_mask` (per-column arrow::compute::filter with path context). Three call sites collapse from ~8 lines each to one. Also rename `process_rg_streaming` → `process_rg_with_data_cols` to mirror its partner `process_rg_predicate_only` — dispatch in build_rg_stream is `data_col_indices.is_empty()`, so the naming now reflects the actual selector. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/parquet-perf
2 hours ago
refactor: simpler shuffle-hint wording; drop fan-out jargon Lead with "flight_shuffle writes shuffle data to disk" — the concrete thing users care about — and drop unfamiliar terms like "fan-out" and "peer-to-peer Arrow Flight". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
worktree-rippling-riding-pudding
3 hours ago
chore: migrate daft.io.lance to daft_lance (#6957) ## Changes Made Removes ALL daft.io.lance logic in favor of the daft_lance extension. This simply re-exports all symbols and points to the same underlying logic. The pylance version comes from daft_lance which uses 6.0.0 ## Related Issues - Closes #6677 - Closes #6678 - Closes #6819
main
3 hours ago
Latest Branches
CodSpeed Performance Gauge
-1%
perf(parquet): rewrite reader with arrow-rs public decoder API
#6952
52 minutes ago
1a8f416
colin/parquet-perf
CodSpeed Performance Gauge
0%
perf(flotilla): Coalesce task outputs based on byte threshold
#6943
2 hours ago
02fee08
colin/flotilla-coalesce-outputs
CodSpeed Performance Gauge
0%
feat(flotilla): hint users to switch to flight_shuffle on large shuffles
#6962
2 hours ago
0bb3457
worktree-rippling-riding-pudding
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs