Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
docs: Update API references and fix code examples (#6938) Fixes critical doc issues from the [documentation audit](https://eventualgroup.slack.com/archives/C0ARMCJKZCN/p1778780482358339?thread_ts=1778746580.784789&cid=C0ARMCJKZCN): - **custom.md**: Use async `get_tasks`/`read` and `RecordBatch` instead of deprecated sync `get_micro_partitions`/`MicroPartition` - **unity_catalog.md**: Fix operator precedence so URL prefix is added before download - **images.md**: Add missing `decode_image()` between download and resize - **audio.md**: Fix undefined `col` and `unnest` references https://claude.ai/code/session_01LNzvmX91kjcWo7htEmEQM7 --------- Co-authored-by: Claude <noreply@anthropic.com>
main
5 minutes ago
style: fix rust fmt formatting for GCS delete implementation Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
daiping8:fix/gcs-delete-support
21 minutes ago
refactor(parquet): rename read APIs, simplify schema/metadata helpers - Public reader API: `stream_parquet`→`read_parquet` (lazy stream), `read_parquet`→`read_parquet_into_recordbatch` (eager). Internal `reader::stream_parquet` now takes `&ParquetReadOptions` instead of ten positional args. - `schema_inference.rs`: collapse per-DataType recursive walks (timestamp/dictionary/large-offset/strings-to-binary) into a single generic `transform_schema(schema, &leaf_fn)`. Cuts ~270 LOC. - `helpers.rs`: inline `normalize_delete_rows`+`deletes_to_row_selection` into `build_single_rg_delete_selection`; simplify `prune_row_groups` branches. - `metadata.rs`: extract `rebuild_file_metadata` helper; inline `metadata_len`; tighten `validate_footer_magic` visibility. - `metadata_adapter.rs`: drop unused `from_arrowrs_with_indices`. - `python.rs`: extract `per_file_from_row_groups` helper. - `reader/chunk_source.rs`: replace `(usize, u64, u64)` / `(u64, u64, Vec<_>)` tuples with named `LeafRange` / `RangeGroup` structs; visibility tweaks. Net: -389 lines, no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/parquet-perf
1 hour ago
refactor: parameterize uuid generation
everettVT/uuid-param-generator
2 hours ago
fix: clarify Paimon validation state names Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>
QuakeWang:fix/paimon-column-order
2 hours ago
perf(shuffle): Write one shuffle file per task instead of N partition files (#6948) ## Changes Made Reduce write-side CPU and scheduling overhead on the Flight shuffle path: - Write one file per map task with offsets for each partition, instead of one file per partition. In theory this should significantly speed up writes (and nvme reads), and also avoid the max ulimit problem. - Write the final file upon repartition finalize instead of incrementally, this allows us to write larger and fewer ipc messages. the ray repartition path also just accumulates at the end anyway. - Repartition incrementally in sink based on a byte threshold, to amortize tiny repartitions. - Chunk Flight server responses at ~4 MiB instead of emitting one `FlightData` per source batch, amortizing the reader's per-batch flatbuffer parse + array construction. ## Benchmarks Wall-clock seconds, lower is better. `daft pypi flt` is daft 0.7.13 with the Flight backend; `daft built flt` is this PR. `mr` = map-reduce shuffle, `psm` = pre-shuffle merge. | scale | parts | daft pypi mr | daft pypi psm | daft pypi flt | daft built flt | Δ vs pypi flt | |---|---:|---:|---:|---:|---:|---:| | sf100_top8 | 32 | 15.84 | 13.52 | 12.33 | 13.70 | +11% | | sf100 | 32 | 19.75 | 23.94 | 19.76 | **15.80** | **−20%** | | sf100 | 256 | 24.59 | 22.56 | 23.17 | 21.99 | −5% | | sf1000_top64 | 128 | 19.25 | 20.26 | 20.89 | **18.35** | −12% | | sf1000 | 256 | 380.13 | 276.90 | 106.83 | 107.59 | ~0% | | sf1000 | 512 | — | — | 140.53 | **110.07** | **−22%** | | sf1000 | 1024 | — | — | 211.68 | **140.70** | **−34%** | | sf10000_top1000 | 256 | 260.41 | 394.09 | 120.96 | **99.70** | **−18%** | | sf10000 | 1024 | — | — | 2556.05 | **1849.32** | **−28%** | | sf10000 | 2048 | — | — | 4529.25 | **3024.56** | **−33%** | --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
main
3 hours ago
docs(shuffle): restore stream.rs comments from main Restores doc comments and inline annotations that were inadvertently stripped during the streaming-reader refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/flight-shuffle-perf
4 hours ago
fix: add try_cast parameter to visit_cast and fix lint errors
XuQianJin-Stars:feat/try-cast
4 hours ago
Latest Branches
CodSpeed Performance Gauge
0%
feat(gcs): implement delete for GCS object store
#6958
40 minutes ago
7709be1
daiping8:fix/gcs-delete-support
CodSpeed Performance Gauge
0%
perf(parquet): rewrite reader with arrow-rs public decoder API
#6952
2 hours ago
e953d16
colin/parquet-perf
CodSpeed Performance Gauge
0%
refactor: parameterize uuid generation
#6961
2 hours ago
0848830
everettVT/uuid-param-generator
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs