Latest Results
perf(window): reduce unnecessary copies of data in finalize() step of window functions to reduce memory usage (#7006)
The window function physical nodes that involve sorting
(partition+order-by and partition+dynamic-frame) were performing two
full-width materializations per group: first to extract the group rows,
then again to produce a sorted copy. The sort operated on a full-width
batch even though only the order-key columns are needed to determine
sort order.
This PR changes the per-group sort path to work at the index level: only
the order-key columns are extracted per group, sorted via argsort to
produce a final index ordering, and then the group is materialized
exactly once in its final sorted form. This halves the number of
full-width take calls per group.
Additionally, the concatenated source data is now explicitly released
before the window function application loop begins
```
| Operation | #7011 | Ours | Δ |
|----------------------|-------------|-----------|------|
| partition & order by | 14,750 MB | 8,931 MB | −39% |
| partition_only | 9,400 MB | 8,651 MB | −8% |
| dynamic_frame | 14,428 MB | 8,924 MB | −38% |
| order_by_only | 10,370 MB | 10,370 MB | 0% |
``` Latest Branches
0%
jackylee-ch:codex-test-iceberg-partition-field-transforms 0%
jackylee-ch:codex-test-parquet-logical-roundtrips 0%
© 2026 CodSpeed Technology