Latest Results
perf: Optimize GroupBy Map Building & List-Agg (#6613)
## Changes Made
For dedupe, tuning the `make_groups` operation for groupby to make it
friendlier for high-cardinality situations without overly affecting low
cardinalities. In particular:
* Used `HashBrown` directly instead of `FnvHashMap` because they use the
faster `foldhash` (apparently Polars uses it too). Note, `foldhash` is
particularly good for small fixed-sized types, while `ahash` is slightly
better for short strings. But I believe `foldhash` is an improvement
over `fnv` so I'm good with this for now. Will see if there are any
regressions and in the future we can consider specializing per dtype
* Refactored out some of the groupby-agg code from `daft-core` into a
new crate `daft-groupby` since we might expand on it a lot in the coming
months and recompiling `daft-core` every time is a huge pain. From
benchmarks, this doesn't seem to have an effect on perf, even without
LTO
* In profiles, I was seeing a lot of overhead from JeMalloc
deallocating, particularly small vectors. To address it:
* For `list_agg`, I moved to using `.take()` on a large vec. I think we
can do better but this already helped a ton
* For `make_groups`, I moved to using `SmallVec<[u64; 2]>`, which helps
about 20%
* I think we can extend this to joins as well, as I'm seeing something
similar there
Overall I see about a 4x improvement on connected components with these
changes vs before (250s vs 70s). Latest Branches
0%
qingfeng-occ:gvfs_local_issue 0%
0%
desmond/fix-skip-existing-ci-macos © 2026 CodSpeed Technology