Latest Results
fix: Handle dtype mismatch error in join_asof join keys (#6904)
Currently as_of joins with mismatched by keys would fail with `mismatch
dtype error`. The fix is to
1. normalize and cast the keys to a shared supertype (e..g. int64 and
float64 are normalized to float64), which is the same methodology used
for the on_key, as well as for the join keys of equality joins.
2. remove the computation of right_cols_to_drop in the local executor,
because it does not drop the casted expressions computed during
normalization, e.g. Cast(Column("left_on_key"), Utf8), and led to
duplicate columns produced in the output (the "left_on_key" column was
duplicated in the result). Since we already computed the desired output
schema in the logical plan, we can simply use this as the basis to prune
columns during execution.
```
left = {"ts": [1, 3, 5], "v": [...]} # ts is Int64
right = {"ts": [2.0, 4.0], "w": [...]} # ts is Float64
# correct output
{"ts": [1, 3, 5 ], "v": [10, 30, 50], "w": [None, 20, 40]}
# without the fix: the second "ts" silently overwrites the first:
{"ts": [None, 2.0, 4.0], "v": [10, 30, 50], "w": [None, 20, 40]}
# ^^^ left ts [1, 3, 5] is gone — no error raised
```
**a more in-depth explanation for the second bug:**
1. This bug requires three conditions to trigger:
- a join key (meaning it should have been a candidate for
right_cols_to_drop)
- that shares a name with the other side (explained later)
- mismatched types on that key (causing normalization to wrap it in a
Cast expression, that prevents it from being caught in
right_cols_to_drop).
2. Here’s how the bug occurs:
1. At the logical plan layer, right_cols_to_drop is computed from bare
unresolved column expressions — before any normalization has occurred.
It is then passed to deduplicate_asof_join_columns, which uses it to
determine which right-side columns need to be renamed with a right.
prefix. Since the join keys are inside the dropped cols set, the
deduplication step skips it since it’s already being dropped.
2. After translation, at the physical plan layer, AsofJoinOperator::new
ignored the output_schema and recomputed right_cols_to_drop from scratch
— but by this point, translation had already wrapped bare Column("g")
expressions in Cast(Column("g"), Utf8). The extract_name closure used in
the recomputation only handled bare column expressions, so it returned
None for any cast-wrapped expression, silently omitting the right key
column from right_cols_to_drop.
4. Without it in the drop set, prune_right_batch kept the column,
producing a record batch with duplicate column names. When to_pydict()
built a Python dict, the duplicate key caused the right-side values to
silently overwrite the left-side values, corrupting the output. Latest Branches
0%
everettVT/uuidv7-arrow-kernel 0%
BABTUNA:feat/temporal-unix-extractors -1%
BABTUNA:feat/temporal-tz-conversions © 2026 CodSpeed Technology