Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

chore: bump minimum PyArrow version to 16 (#6868) ## Summary Daft's stated policy is to support the latest 8 minor PyArrow versions. With PyArrow 24 released, the supported window should be `>= 16, < 25`. ## Changes **`pyproject.toml`** (3 callsites): - Main dependency: `pyarrow >= 8.0.0,<24.0.0` → `pyarrow >= 16.0.0,<25.0.0` - `hudi` extra: `pyarrow >= 8.0.0,<22.1.0` → `pyarrow >= 16.0.0,<25.0.0` - Dev dependency group: `pyarrow >= 8.0.0,<24.0.0` → `pyarrow >= 16.0.0,<25.0.0` **Test cleanup:** - Removed 7 `@pytest.mark.skipif(pa.__version__ < "16", reason="PyArrow < 16 has limited float16 support")` markers from `tests/series/test_numeric_ops.py` (4) and `tests/series/test_float.py` (3) — these are now always-true conditions. - Simplified `tests/series/__init__.py` to include `pa.float16()` in `ARROW_FLOAT_TYPES` unconditionally rather than gating on `pa.__version__ >= "16"`. ## Out of scope (left for a follow-up) - CI matrix updates (item 3 of the issue) — happy to do this in a follow-up if a maintainer can point me at the right workflow file. - Documentation updates (item 4) — `grep -rn "pyarrow.*8\." docs/` and `grep -rn "PyArrow.*8" docs/` returned no matches for the old `>= 8` lower bound, so docs may already be clean. Let me know if I missed a location. Closes #6862
main
2 hours ago
fix(lance): resolve rebase conflict - accept main's refactored utils module
XuQianJin-Stars:fix/lance-utils-redundant-num-groups
10 hours ago
perf(parquet): rewrite reader with arrow-rs public decoder API (#6952) ## Summary Replaces the parquet2-based reader (`arrowrs_reader`, `async_reader`, `read_planner`) with one built on arrow-rs's `array_reader` API. New module `daft-parquet/src/reader/` covering byte IO + decode + per-row-group assembly, plus a sibling `helpers.rs` for predicate / row-selection utilities. ## Layout ``` src/daft-parquet/src/ ├── reader/ │ ├── mod.rs entry points (stream_parquet, read_parquet), phase-1 dispatch, │ │ concurrent-but-ordered per-RG decode driver │ ├── chunk_source.rs byte sources: local pread, remote coalesced range GETs │ ├── field_reader.rs schema walk + per-column decode │ ├── rg_processor.rs per-RG streaming + chunked-pred processors │ └── util.rs pure helpers └── helpers.rs predicate / row-selection utilities (pushable cols, offset / delete-vector selections, RG pruning) ``` ## Key behaviors - **Local**: positioned coalesced `pread`s (64KB gap merge). No mmap, no whole-file `Bytes`. - **Remote**: per-RG range GETs with adjacency merge (≤1MB gap, split runs >24MB at chunk boundaries into ~16MB pieces). Fetches run in background; decoders await an assembled `HashMap<leaf, Bytes>` via `Shared` — never park on per-byte IO. Duplicate RG indices in the same scan are supported (no eviction on consume). - **Concurrency / ordering**: per-RG decode tasks are spawned concurrently onto the compute runtime via a `JoinSet`, each draining into a per-RG bounded channel (capacity 1) which is read back in RG order. Output is always file-ordered within a single file; `maintain_order=false` at the scan layer only reorders BETWEEN scan tasks. - **Predicate pushdown** (two-phase): decode pred cols in parallel across RGs, evaluate per-RG, reuse the decoded arrays during assembly so cols in both predicate and projection (e.g. `where(col > 50)`) are decoded once. - **Chunked-pred path**: when projection is fully covered by predicate cols, stream pred cols chunk-by-chunk and filter in place. Within-RG early stop on `limit`. - **Iceberg field-id mapping** applied before filtering active leaves by name. - **Nested types** decoded via lockstep parquet+arrow schema walk. ## Note: `experimental` feature on parquet dep `array_reader` (with `PrimitiveArrayReader`, `make_byte_array_reader`, `ListArrayReader`, …) is gated behind parquet's `experimental` feature in arrow-rs — public but `#[doc(hidden)]` and no semver guarantees. Our specific touchpoints have been stable in name for 1–2 years; future parquet bumps may require minor adjustment. ## Test plan - [x] `cargo test -p daft-parquet` - [x] `pytest tests/io/test_parquet.py tests/recordbatch/recordbatch_io/test_parquet.py` (62 passed) - [x] `pytest tests/io` (861 passed, 1 unrelated moto fixture error) - [x] Iceberg + projection + predicate paths - [x] Local + remote benchmarks (below) ## Benchmarks `daft.read_parquet(uri).to_arrow()` plus projection / limit / where / combined variants. This PR built `maturin develop --release`; PyPI versions installed from `pip install daft==X`. 2 warmups, 5–7 repeats, best-of. **Environment**: EC2 aarch64 (Graviton), us-west-2. Local fixtures on EBS; remote reads target `s3://colinho-test/parquet_bench_fixtures/` in the same region. (Earlier draft used macOS — re-running on EC2 because Mac thermals were swinging numbers by 20–40% across reruns.) **Comparison points**: - **this PR** — HEAD of `colin/parquet-perf`. - **0.7.13** — current PyPI latest (effectively `main`). - **0.7.3** — older release pulled in by request for a broader baseline. ### Local (full read) | rows | cols | rgs | 0.7.3 | 0.7.13 | this PR | vs 0.7.13 | vs 0.7.3 | |---:|---:|---:|---:|---:|---:|---:|---:| | 10M | 1 | 1 | 54.0 ms | 20.7 ms | 20.9 ms | 0.99× | **2.58×** | | 10M | 1 | 8 | 33.6 ms | 14.7 ms | 13.0 ms | 1.13× | **2.58×** | | 10M | 1 | 64 | 23.3 ms | 20.0 ms | 18.4 ms | 1.09× | 1.27× | | 1M | 32 | 1 | 40.4 ms | 110.8 ms | 31.3 ms | **3.54×** | 1.29× | | 1M | 32 | 8 | 43.5 ms | 27.2 ms | 22.1 ms | 1.23× | **1.97×** | | 1M | 32 | 64 | 67.8 ms | 46.0 ms | 46.9 ms | 0.98× | 1.45× | | 10K | 1024 | 1 | 46.5 ms | 73.9 ms | 34.5 ms | **2.14×** | 1.35× | | 10K | 1024 | 8 | 160.1 ms | 109.0 ms | 94.2 ms | 1.16× | **1.70×** | | 10K | 1024 | 64 | 1190.9 ms | 685.5 ms | 566.5 ms | 1.21× | **2.10×** | Aggregate: **1.31× vs 0.7.13** (1.11 s → 0.85 s), **1.96× vs 0.7.3** (1.66 s → 0.85 s). ### Remote (S3, all 5 ops) | shape | op | 0.7.3 | 0.7.13 | this PR | vs 0.7.13 | vs 0.7.3 | |---|---|---:|---:|---:|---:|---:| | 10M×1×1 | full | 220.8 | 282.5 | 273.4 | 1.03× | 0.81× | | 10M×1×1 | project_1col | 225.3 | 280.0 | 241.2 | 1.16× | 0.93× | | 10M×1×1 | limit_100 | 198.2 | 277.5 | 241.0 | 1.15× | 0.82× | | 10M×1×1 | filter_high | 231.2 | 371.0 | 242.0 | **1.53×** | 0.96× | | 10M×1×1 | combined | 162.3 | 316.6 | 252.3 | 1.26× | 0.64× | | 10M×1×8 | full | 177.2 | 537.1 | 195.0 | **2.75×** | 0.91× | | 10M×1×8 | project_1col | 173.2 | 558.5 | 170.3 | **3.28×** | 1.02× | | 10M×1×8 | limit_100 | 91.0 | 214.4 | 126.3 | 1.70× | 0.72× | | 10M×1×8 | filter_high | 194.5 | 635.4 | 187.9 | **3.38×** | 1.04× | | 10M×1×8 | combined | 159.6 | 190.8 | 174.3 | 1.09× | 0.92× | | 10M×1×64 | full | 186.7 | 2990.1 | 171.8 | **17.40×** | 1.09× | | 10M×1×64 | project_1col | 191.8 | 2877.0 | 225.0 | **12.79×** | 0.85× | | 10M×1×64 | limit_100 | 83.0 | 163.7 | 141.3 | 1.16× | 0.59× | | 10M×1×64 | filter_high | 186.4 | 2849.2 | 229.0 | **12.44×** | 0.81× | | 10M×1×64 | combined | 153.7 | 200.7 | 123.3 | 1.63× | 1.25× | | 1M×32×1 | full | 271.6 | 481.1 | 294.8 | **1.63×** | 0.92× | | 1M×32×1 | project_1col | 93.7 | 171.3 | 156.6 | 1.09× | 0.60× | | 1M×32×1 | limit_100 | 238.0 | 335.8 | 314.5 | 1.07× | 0.76× | | 1M×32×1 | filter_high | 325.5 | 448.8 | 421.0 | 1.07× | 0.77× | | 1M×32×1 | combined | 99.0 | 192.0 | 209.3 | 0.92× | 0.47× | | 1M×32×8 | full | 285.5 | 865.8 | 247.2 | **3.50×** | 1.16× | | 1M×32×8 | project_1col | 101.6 | 533.6 | 211.8 | **2.52×** | 0.48× | | 1M×32×8 | limit_100 | 145.9 | 250.0 | 201.4 | 1.24× | 0.72× | | 1M×32×8 | filter_high | 337.7 | 1231.8 | 306.0 | **4.03×** | 1.10× | | 1M×32×8 | combined | 97.7 | 228.3 | 149.3 | 1.53× | 0.65× | | 1M×32×64 | full | 378.1 | 3600.1 | 330.6 | **10.89×** | 1.14× | | 1M×32×64 | project_1col | 303.6 | 3060.5 | 257.1 | **11.90×** | 1.18× | | 1M×32×64 | limit_100 | 155.4 | 334.3 | 198.6 | 1.68× | 0.78× | | 1M×32×64 | filter_high | 417.9 | 6402.3 | 389.6 | **16.43×** | 1.07× | | 1M×32×64 | combined | 295.2 | 269.6 | 176.6 | 1.53× | **1.67×** | | 10K×1024×1 | full | 332.0 | 493.2 | 402.6 | 1.22× | 0.82× | | 10K×1024×1 | project_1col | 138.2 | 293.3 | 290.4 | 1.01× | 0.48× | | 10K×1024×1 | limit_100 | 314.7 | 403.9 | 493.5 | 0.82× | 0.64× | | 10K×1024×1 | filter_high | 358.6 | 599.1 | 507.3 | 1.18× | 0.71× | | 10K×1024×1 | combined | 159.5 | 272.1 | 256.5 | 1.06× | 0.62× | | 10K×1024×8 | full | 501.1 | 1021.2 | 377.2 | **2.71×** | 1.33× | | 10K×1024×8 | project_1col | 188.6 | 626.0 | 272.2 | **2.30×** | 0.69× | | 10K×1024×8 | limit_100 | 244.4 | 390.2 | 337.8 | 1.15× | 0.72× | | 10K×1024×8 | filter_high | 515.6 | 1291.5 | 481.7 | **2.68×** | 1.07× | | 10K×1024×8 | combined | 186.2 | 326.1 | 292.9 | 1.11× | 0.64× | | 10K×1024×64 | full | 1766.0 | 4386.4 | 892.2 | **4.92×** | **1.98×** | | 10K×1024×64 | project_1col | 506.1 | 3080.5 | 395.5 | **7.79×** | 1.28× | | 10K×1024×64 | limit_100 | 523.4 | 493.0 | 421.7 | 1.17× | 1.24× | | 10K×1024×64 | filter_high | 1964.7 | 7135.6 | 1071.6 | **6.66×** | **1.83×** | | 10K×1024×64 | combined | 473.8 | 493.7 | 374.5 | 1.32× | 1.27× | (ms, all best-of.) Aggregate: **3.82× vs 0.7.13** (52.5 s → 13.7 s), **1.05× vs 0.7.3** (14.4 s → 13.7 s). ### Headline - **vs 0.7.13** (current PyPI latest): this PR wins everywhere — **1.31× local**, **3.82× remote**, with up to **17× on many-RG remote `full`/`filter` reads**. 0.7.13 has its own remote regressions vs 0.7.3 (especially 64-RG shapes) — this PR fixes them. - **vs 0.7.3**: this PR is **1.96× local**, roughly flat in remote aggregate but with notable wins on many-RG `full` / `filter_high` (the byte-range-coalescing wins) and clear losses on small-payload limit/projection ops where per-RG setup overhead doesn't recoup on tiny GETs. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main
17 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
feat: add try_cast function for safe type conversion#6960
2 days ago
f4122ca
XuQianJin-Stars:feat/try-cast
CodSpeed Performance Gauge
0%
11 hours ago
d3cadbd
XuQianJin-Stars:fix/lance-utils-redundant-num-groups
CodSpeed Performance Gauge
-1%
13 hours ago
fe74dd4
BABTUNA:feat/temporal-unix-extractors
© 2026 CodSpeed Technology
Home Terms Privacy Docs