Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

feat: exact median using `select_nth_unstable`
aaron-ang:median
2 hours ago
feat: add image_hash() for image deduplication (#6485) Adds native perceptual hashing to Daft, enabling large-scale image similarity detection and deduplication workflows without requiring custom UDFs. --- #### Changes New `daft.functions.image_hash()` function that accepts an Image column and returns a `FixedSizeBinary` column. Supports 8 algorithms: | Algorithm | Description | |-----------|-------------| | phash (default) | Full 2D DCT perceptual hash — most robust | | phash_simple | Row-wise DCT only — faster variant | | dhash | Horizontal difference hash — fast structural comparison | | dhash_vertical | Vertical difference hash | | ahash | Average hash — fastest | | whash | Multi-level Haar wavelet hash | | crop_resistant | Segment-based hash — robust against cropping | | colorhash | HSV color distribution hash | --- #### Implementation Notes - **Rust backend**: all algorithms are implemented natively in the `daft-image` crate, including an FFT-based DCT and multi-level Haar DWT, with results that are bit-exact against the Python `imagehash` library - **Type system**: `HashMethod` enum implements `FromLiteral` + `FromStr`; argument parsing uses `#[derive(FunctionArgs)]`, with no redundant string round-trips - **Performance**: batch hashing runs in parallel across all CPU cores via `rayon`; the resize kernel operates on single-channel luma rather than RGB, reducing the dominant convolution cost proportionally — together these yield 5–25× speedups over an equivalent Python UDF on the same Daft pipeline - **Null propagation**: null images produce null hashes, consistent with other Daft column operations - **Input validation**: Python validates `method`, `hash_size`, `binbits`, `segments`, and the power-of-2 constraint for `whash` early, with clear error messages --- #### Tests - `tests/cookbook/test_image_hash_compat.py`: bit-exact compatibility tests against the `imagehash` Python library - `tests/cookbook/test_image_hash.py`: standalone tests covering output dtype/size, null propagation, identical-image zero distance, discriminability, similarity ordering, and error handling — all 8 algorithms covered - `tests/recordbatch/image/test_image_hash.py`: RecordBatch-level tests - `src/daft-image/benches/image_ops.rs`: benchmarks for all algorithms --- Closes #4889
main
15 hours ago
style: run cargo fmt
BABTUNA:feat/inline-agg-minmax
24 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
feat: percentile and median ops#6153
3 hours ago
fb4ee5a
aaron-ang:median
CodSpeed Performance Gauge
+27%
1 day ago
10b5446
BABTUNA:feat/inline-agg-minmax
CodSpeed Performance Gauge
-1%
1 day ago
5d5f0ef
desmond/fix-dependabot-6598
© 2026 CodSpeed Technology
Home Terms Privacy Docs