Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

refactor(inline-agg): fold packed-u64 fast path into agg_symbolized_path Restructures the packed-u64 two-string-column optimization to live inside `agg_symbolized_path` rather than as a separate dispatch branch. Eliminates the double byte-tally that previous revisions of this PR paid on the TPC-H Q1 short-string path, and matches PR #6748's dispatch structure exactly — `agg_groupby_inline`'s multi-column branch is now identical to #6748's. Changes: - `agg_symbolized_path` now symbolizes each Utf8/Binary col into a `Vec<u32>` once, stored alongside a `None` slot for non-string cols. If exactly two string cols, pack the two symbol IDs into a u64 and group with `FnvHashMap<u64, u32>` via the new `agg_packed_u64_inner` helper. Otherwise (mixed shape or 3+ cols), rebuild the symbolized RecordBatch and call `agg_generic_hash_path` as before. - Delete the separate `agg_packed_u64_path` function and the `symbolize_string_col` helper; their logic is now inline in `agg_symbolized_path`. - Revert the dispatcher in `agg_groupby_inline` to PR #6748's structure (`match agg_symbolized_path → None: generic`). - Update doc comments on three inherited tests to reflect that the packed-u64 fast path lives inside `agg_symbolized_path`, not as a separate dispatch step. Tests: all 48 `inline_agg` tests pass, including the 5 `test_inline_packed_u64_*` cases. Benchmarks: long-string two-col shapes (1.2M-5M rows x 32-100k groups) continue to show ~1.06x-1.18x speedup vs the equivalent no-fast-path baseline on the same restructured code, matching the earlier separate-function implementation's measurements. Short-string shapes (TPC-H Q1) now route through the exact same code path as PR #6748 — no overhead added.
BABTUNA:perf/packed-symbol-groupby
6 minutes ago
perf(shuffle): spawn-free oneshot writer + read-side server concat Two isolated wall-time wins on the Flight shuffle path. Measured on N=8192 outputs / M=200 inputs / 8 GiB / NVMe. Oneshot writer (60% wall reduction; 68 s → 27.5 s) The previous `RepartitionSink` writer fanned out one `tokio::spawn` per output partition per map task to run `concat_one_partition` in parallel, then `join_all`'d before IPC-encoding. At N=8192 / M=200 that's 1.6 M futures scheduled per shuffle. The outer `map_conc` was already at 16 (= cores), so the inner fan-out only added cross-thread cache and allocator contention. Doing the concat serially inside a single `spawn_blocking` per map task (`write_partitions_one_shot`) cuts writer-step p50 from 3925 ms → 913 ms. Per-future inner timings also dropped 4-9× across all sub-buckets, consistent with thrash-free serial execution. Read-side server concat (21% wall reduction; 86 s → 68 s, before combining with the writer fix) The Flight server previously emitted one `FlightData` per source IPC batch (~5 KB each). The client paid a fixed per-batch flatbuffer parse + array-construction tax — 18 us × 1.6 M batches dominated read wall. `concat_specs_into_flight_data_stream` now reads source batches into a pending buffer and emits a fused `FlightData` once the buffer crosses ~4 MiB (configurable via `DAFT_SHUFFLE_CHUNK_BYTES`; 0 disables and falls back to the per-batch path). Preserves per-map-task file isolation; no rewrite phase; memory transient bounded by `chunk_target × num_reducers`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/flight-shuffle-perf
3 hours ago
feat(temporal): add Spark-style temporal aliases (#6830) ## Summary Implements eight Spark-style temporal aliases (#3798) as thin wrappers over Daft's existing canonical temporal functions, so users migrating from Spark can use the names they already know without learning Daft's naming. This PR adds Python and SQL aliases for `dayofmonth`, `dayofyear`, `weekofyear`, `date_format`, `trunc`, `dateadd`, `datediff`, and `datepart`. Every alias is a pure delegation to an already-tested Daft function — no new logic except `trunc`'s interval normalization and `datepart`'s dispatch table. ## Why The issue asks for Spark-name parity. This PR focuses on three practical pieces: - The camelCase / no-underscore Spark variants of existing extractors (`dayofmonth`, `dayofyear`, `weekofyear`, `dateadd`, `datediff`). - `date_format` as a Spark alias for `strftime`. - Two functions with Spark-specific behavior: `trunc(date, format)` (argument order opposite to `date_trunc`) and `datepart(part, expr)` (dispatches to the right extractor based on the `part` literal). ## Changes Made - Add eight Python alias functions in `daft/functions/datetime.py`: - `dayofmonth`, `dayofyear`, `weekofyear` → delegate to `day_of_month`, `day_of_year`, `week_of_year`. - `date_format` → delegates to `strftime`. - `trunc(input, interval)` → normalizes bare unit strings (`"day"` → `"1 day"`) then delegates to `date_trunc(interval, input)` with arguments swapped to match Spark. - `dateadd`, `datediff` → delegate to `date_add`, `date_diff`. - `datepart(part, expr)` → dispatches to the appropriate extractor (`year`, `month`, `day`, etc.) based on the `part` string literal. - Export all eight names from `daft/functions/__init__.py` (alphabetical, both in the import block and `__all__`). - Add SQL handlers in `src/daft-sql/src/modules/temporal.rs`: - Direct `ScalarUDF` registrations for `dayofmonth`, `dayofyear`, `weekofyear`, `date_format`. - `SQLTrunc` handler for Spark-style `trunc(input, interval)` argument order with interval normalization. - `SQLDatePart` handler with the part-string dispatch. - `dateadd` and `datediff` reuse the existing `SQLDateAdd` / `SQLDateDiff` handlers. - Re-export `ToString` from `src/daft-functions-temporal/src/lib.rs` so the SQL layer can wire `date_format` to it. - Add regression coverage: - `tests/dataframe/test_temporals.py::test_temporal_alias_functions` — every alias compared against its canonical counterpart on a leap-day timestamp. - `tests/sql/test_temporal_exprs.py::test_temporal_sql_aliases` — same on the SQL side. ## Behavior - All eight aliases produce identical output to their canonical Daft counterparts. - `trunc(ts, 'month')` works as a Spark user expects — `(input, interval)` order — and is functionally equivalent to `date_trunc('1 month', ts)`. - `datepart('year', ts)` returns the year as `Int32`; other parts (`month`, `day`, `hour`, `minute`, `second`, `quarter`, `week`, `dayofweek`, `dayofyear`) return the same dtypes as the underlying extractors. - Invalid `datepart` parts raise a clear `ValueError`. ## Test Plan - `cargo check -p daft-functions-temporal -p daft-sql` - `make build` - `DAFT_RUNNER=native pytest -q tests/dataframe/test_temporals.py::test_temporal_alias_functions tests/sql/test_temporal_exprs.py::test_temporal_sql_aliases` ## Related Issues - Part of #3798
main
5 hours ago
rename to cluster.yaml
feat/asof-benchmarks
11 hours ago

Latest Branches

CodSpeed Performance Gauge
-12%
perf(inline-agg): pack two-string-column keys into u64 for typed FNV grouping#6924
32 minutes ago
c689536
BABTUNA:perf/packed-symbol-groupby
CodSpeed Performance Gauge
-1%
55 minutes ago
23e2acf
QuakeWang:fix/paimon-column-order
CodSpeed Performance Gauge
-1%
2 hours ago
ade52cb
BABTUNA:feat/show-preview-defaults
© 2026 CodSpeed Technology
Home Terms Privacy Docs