Latest Results
feat(functions): add string distance/similarity functions (#7068)
## Changes Made
Add four pairwise string distance/similarity functions as pure Rust
scalar UDFs:
- `levenshtein_distance` - minimum edit distance (Int64)
- `jaro_similarity` - similarity score 0.0-1.0 (Float64)
- `jaro_winkler_similarity` - Jaro with prefix bonus (Float64)
- `damerau_levenshtein_distance` - Levenshtein + transpositions (Int64)
Follows the existing `hamming_distance_str` pattern. No external
dependencies. Exposed via `daft.functions` API and as Expression
methods. Null-safe (returns null when either input is null).
## Related Issues
Fixes #6794
## Test Plan
24 pytest test cases in `tests/functions/test_string_distance.py`:
- **Levenshtein** (6 tests): basic edit distance, empty strings, null
handling, identical strings, single-char edits
(substitution/insertion/deletion), expression method
- **Jaro** (6 tests): identical strings, completely different strings,
known reference values (martha/marhta = 0.944444), null handling, empty
vs nonempty, expression method
- **Jaro-Winkler** (6 tests): identical strings, prefix bonus >= Jaro
invariant, known reference values (martha/marhta = 0.961111), no common
prefix (JW == Jaro), null handling, expression method
- **Damerau-Levenshtein** (6 tests): basic transposition, transposition
vs standard Levenshtein (ab/ba = 1 vs 2), empty strings, identical
strings, null handling, expression method
```
DAFT_RUNNER=native pytest tests/functions/test_string_distance.py -v
24 passed in 0.06s
```
Rust compilation verified:
```
cargo check --workspace # zero errors
cargo clippy -p daft-functions-utf8 --no-deps # zero warnings on new code
```
## AI Disclosure
AI-assisted implementation (Claude Opus 4.6). Latest Branches
0%
chenghuichen:ignore_corrupt -1%
0%
XuQianJin-Stars:fix/transformers-classifier-tests-hf-429 Β© 2026 CodSpeed Technology