Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

fix(io): default https scheme for scheme-less S3 endpoint_url (#7111) ## Changes Made Defaults the scheme of a user/catalog-provided S3 `endpoint_url` to `https` when none is present, in the S3 endpoint normalization in `src/daft-io/src/s3_like.rs` (the same block that adds trailing slashes, #5575). Some Iceberg REST catalogs vend `s3.endpoint` as a bare host, optionally with a trailing slash (e.g. `s3.example.com/`). A scheme-less endpoint fails `url::Url::parse`, falls through to the verbatim string, and every S3 read fails: ``` DispatchFailure(... ResolveEndpointError { message: "Custom endpoint `s3.example.com/` was not a valid URI", source: Some(InvalidUri(InvalidFormat)) }) ``` (Writes can appear to succeed -- PyArrow already treats a bare endpoint as `https` -- while reads fail, which makes this confusing to debug.) The fix lives in `build_s3_conf`'s endpoint normalization rather than in the Iceberg property conversion, so it covers every `S3Config` consumer (direct users and any catalog), not just Iceberg-REST-vended endpoints. The trailing-slash handling from #5575 is now reached for bare hosts too. Extracted the normalization into a pure `normalize_endpoint_url` helper and added unit tests covering the scheme-default and the (previously untested) #5575 trailing-slash behavior. Notes: - Defaulting to `https` matches PyArrow's behavior. Users who need `http` keep passing a schemed endpoint -- those are untouched. - This cannot break working setups: a scheme-less endpoint always failed before. ## Related Issues Extends the endpoint normalization from #5575. Follow-up of #6993 (the bare-host endpoint was first observed via Iceberg-REST auto-config).
main
46 minutes ago
refactor(inline-agg): consolidate Sum/Product/Min/Max into one macro (#7151) ## Summary Consolidates the three near-identical accumulator macros (`define_sum_accum!`, `define_product_accum!`, `define_minmax_accum!`) into a single `define_reducer_accum!` parameterized on the binary reduce op. The 28 generated accumulator structs (4 Sum + 4 Product + 10 Min + 10 Max) keep their original names and behave identically. Single-file diff to `src/daft-recordbatch/src/ops/inline_agg.rs`. ## Why Follow-up to @colin-ho's comment on #7036: > It's going to become a many thousand line file. Any ideas? > And also since the accumulators are very similar, it might be worth making consolidating into a generic one. The Sum/Product/Min/Max macros had the same struct layout, same `init_groups`, same null/no-null branches in `update_batch`, same `finalize`. They differed only in the binary reduce expression (`+`, `*`, `std::cmp::min`, `std::cmp::max`, or NaN-handling float variants). Promoting the reduce expression to a macro parameter eliminates the duplication while staying in the file's existing macro-per-shape style. ## Changes Made - Deleted: `define_sum_accum!`, `define_product_accum!`, `define_minmax_accum!` - Added: single `define_reducer_accum!($name, $daft_type, $native, $reduce:expr)` - The 28 accumulator types (`SumAccumI64`, ..., `MaxAccumF64`) keep their identical names and identical generated code - `CountAccum`, `BoolAndAccum`, `BoolOrAccum`, the `AggAccumulator` enum, `try_create_accumulator`, `can_inline_agg`, and all hash paths and tests are untouched Net diff: **+41 / -164** (123 lines removed). ## Why this approach vs. a trait + generic struct Considered a trait + generic struct (`ReducerAccum<T, OP>` + `ReduceOp` trait + ZST markers `AddOp`/`MulOp`/...). Decided against it because: 1. **No precedent in the workspace** -- grep across `src/` found zero similar ZST-marker-trait patterns. The file's existing convention is macros, and BoolAnd/BoolOr would stay as macros either way. 2. **Subtle finalize regression** -- the generic version lost a zero-copy non-null fast path because the `IntoSeries` trait bound requires `pub(crate)`-only `SeriesLike`, forcing a rewrite to `Series::from_arrow` that always allocates a null buffer (per `arrow-array::PrimitiveArray::from_iter`). 3. **Macro consolidation captures the duplication for free** -- the surviving macro is shorter than the original three combined and has identical generated code. ## Behavior Functionally byte-equivalent to the existing per-op macros. Each `define_reducer_accum!` invocation generates the same struct + `impl` block the old macros did; the `$reduce:expr` parameter is bound to a `fn($native, $native) -> $native` to force monomorphization (same idiom the original `define_minmax_accum!` used for its `$cmp_fn`). ## Benchmarks 10M-row synthetic groupby (key + 2 numeric val cols, `count + sum + min + max` aggregates). 15 reps per config, median wall time, taskset-pinned to 6 physical cores, native runner, Python 3.10, Rust --release, Linux (WSL, 6 physical / 12 logical cores). | Shape | Main (s) | Refactor (s) | speedup | |---|---|---|---| | String key, ~32 groups | 4.48 | 4.35 | 1.03x | | String key, ~10k groups | 4.71 | 4.48 | 1.05x | | Int key, ~32 groups | 2.88 | 2.72 | 1.06x | | Int key, ~10k groups | 3.01 | 2.84 | 1.06x | | TPC-H Q1-like (6M rows) | 1.37 | 0.97 | 1.41x | Refactor matches or beats main on every measured shape; the Q1-like delta is unexpectedly large but consistent across 15 reps (p10-p90 spread ~5% on both sides). Likely an LLVM inlining improvement from collapsing the three macros into one. Peak RSS unchanged. ## Test Plan - `cargo build -p daft-recordbatch --lib` clean - `cargo test -p daft-recordbatch --lib inline_agg::tests` -- 56/56 pass - `cargo fmt -p daft-recordbatch` clean - Sharded path + symbolized path tests pass unchanged. ## Related Issues Follow-up to #7036 review.
main
2 hours ago
fix(core): handle all-null list literals
everySympathy:codex/fix-null-list-udf
15 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
feat(session): resolve catalog-qualified identifiers in namespace methods#7171
12 hours ago
52e14da
Lucas61000:issue-7170
CodSpeed Performance Gauge
0%
fix(core): handle all-null list literals#7168
18 hours ago
edd4c0f
everySympathy:codex/fix-null-list-udf
CodSpeed Performance Gauge
0%
18 hours ago
fcf5e43
XuQianJin-Stars:fix/ray-flotilla-udf-exception-propagation
© 2026 CodSpeed Technology
Home Terms Privacy Docs