Latest Results
fix(dashboard): pass partition sets to repr_json so plan matches execution topology (#6576)
## Summary
- The dashboard plan tree could differ from actual execution because the
pipeline tree structure depends on partition counts, which come from the
driver's cached partition sets
- Without psets, `InMemorySourceNode` defaults to 0 partitions, causing
any downstream node that branches on partition count (Sort, joins,
shuffles, etc.) to produce a different tree shape with different node
IDs
- This caused node ID mismatches between the dashboard plan and
execution stats, so stats wouldn't attach to the correct operators
- Accepts optional `psets` parameter in `repr_json()`, builds partition
sets from the driver's cache, and passes them through so the pipeline
tree matches execution
- Updates Python type stub and ray runner to supply psets
Note a separate PR #6575 actually adds phase-aware stats for distributed
Sort, which are also required for the counts in the dashboard to look
right.
## Test plan
- [x] Run a distributed query with Sort (e.g. `read_parquet â limit â
groupby â sort`) and verify the dashboard plan tree matches execution
node IDs
- [x] Verify `repr_json()` still works without psets (backward
compatible)
đ¤ Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Latest Branches
0%
-1%
everettVT/file-path-casting -1%
Š 2026 CodSpeed Technology