Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

feat(window): support arbitrary expressions in Window.partition_by and Window.order_by
feat/window-definitions
2 hours ago
Merge branch 'main' into feat/window-optimizer
feat/window-optimizer
3 hours ago
Merge branch 'main' into feat/window-definitions
feat/window-definitions
3 hours ago
Merge branch 'main' into feat/window-execution
feat/window-execution
4 hours ago
adds tests and docs
rchowell/create_table_if_not_exists
6 hours ago
feat: adds try_encode and try_decode with utf-8 special-case (#4060) ## Summary **This gives ~3.2x speedup for decoding binary arrays into string arrays** This PR adds try_encode and try_decode with utf-8 special-case. You'll see cases for binary-to-binary transforms like gzip compress and decompress, as well as binary-to-text and text-to-binary transformations for things like converting bytes to utf-8 and visa-versa. We can continue to build from this [with additional encodings](https://docs.python.org/3/library/codecs.html#standard-encodings) and I've carved out a special no-copy path for utf-8. ## Performance Results Three runs with 10 iterations (+1 warmup) on 1 million rows shows ~3.2x speedup. ``` ❯ pytest ./tests/functions/test_codecs.py -k test_try_decode_utf8_perf -s Native try_decode stats (seconds): {'mean': 0.1969996452331543, 'median': 0.19691014289855957, 'min': 0.1933138370513916, 'max': 0.20042800903320312, 'stdev': 0.0018028098671721037} UDF try_decode stats (seconds): {'mean': 0.6376919507980346, 'median': 0.6374071836471558, 'min': 0.6186070442199707, 'max': 0.6605658531188965, 'stdev': 0.011603869017790357} **Average speedup: 3.24x** ❯ pytest ./tests/functions/test_codecs.py -k test_try_decode_utf8_perf -s Native try_decode stats (seconds): {'mean': 0.19709632396697999, 'median': 0.19748806953430176, 'min': 0.19363689422607422, 'max': 0.1991891860961914, 'stdev': 0.00167838499446807} UDF try_decode stats (seconds): {'mean': 0.6387589693069458, 'median': 0.639365553855896, 'min': 0.6251809597015381, 'max': 0.651353120803833, 'stdev': 0.0075957305958397415} **Average speedup: 3.24x** ❯ pytest ./tests/functions/test_codecs.py -k test_try_decode_utf8_perf -s Native try_decode stats (seconds): {'mean': 0.19655859470367432, 'median': 0.19698894023895264, 'min': 0.19165897369384766, 'max': 0.19891595840454102, 'stdev': 0.0019603584148133366} UDF try_decode stats (seconds): {'mean': 0.6334790706634521, 'median': 0.6332188844680786, 'min': 0.6258370876312256, 'max': 0.6455898284912109, 'stdev': 0.0063130945873989455} **Average speedup: 3.22x** ``` ## Related Issues #3989 #4062 ## Changes Made * Adds codec kind to differentiate between text and binary encodings * Adds try_encode and try_decode to python expression API (and all layers beneath) * Adds a special-case udf for decoding utf-8 since we only need to validate the bytes ## Checklist - [x] All tests have passed - [x] Documented in API Docs - [x] Documented in User Guide - [x] If adding a new documentation page, doc is added to `docs/mkdocs.yml` navigation - [x] Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)
main
7 hours ago

Active Branches

feat(window): Add window function definitions and skeleton API
last run
2 hours ago
#4082
CodSpeed Performance Gauge
0%
#4093
CodSpeed Performance Gauge
0%
#4097
CodSpeed Performance Gauge
0%
© 2025 CodSpeed Technology
Home Terms PrivacyDocs