Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
refactor(flotilla): trim docstrings, use while-loop in LimitNode - Drop verbose docstrings in `_LimitCounterImpl` / `LimitCounterHandle` / `start_limit_counter_actor`; the method bodies and types are clear on their own. - Replace the `loop { if done { break; }; select! { ... } }` pattern in `LimitNode::limit_execution_loop` with a `while` whose match returns "still pending."
colin/distributed-limit-actor
4 hours ago
rename to cluster.yaml
feat/asof-benchmarks
5 hours ago
test(lance): drop misleading pragma: no cover on exercised count_rows The count_rows() body in the broken-fragment fixture is actually executed during the test: the call raises and the except branch in distribute_fragments_balanced catches it, so coverage.py reports the line as covered. The 'pragma: no cover' annotation is therefore misleading. Drop it. Addresses Greptile review feedback.
XuQianJin-Stars:fix/lance-utils-redundant-num-groups
8 hours ago
fix(scheduler): emit Cancelled events when filtering cancelled tasks The `is_cancelled` filter at the scheduler (added so `LimitNode` can drop pending tasks once the limit is satisfied) was silently dropping cancelled tasks. This broke two invariants: - `dropping_result_stream_still_finalizes_operators` lifecycle test: every `OperatorStart` is supposed to be paired with `OperatorEnd`, but tasks cancelled at the scheduler emitted neither a `TaskStatus::Cancelled` nor a `TaskEvent::Cancelled`, so the lifecycle gate never closed. - UI/observability: nodes whose tasks were all filtered before dispatch would show as perpetually "running." Change `Scheduler::{enqueue_tasks, schedule_tasks}` to return the cancelled-pending-tasks they dropped. `SchedulerLoop` emits `TaskEvent::Cancelled` for each and then drops them (closing their `result_tx`, which unblocks awaiters with `Ok(None)`). `test_scheduler_actor_cancelled_task` previously assumed a cancelled task always reached a worker so its `cancel_callback` would fire the notifier. With the scheduler-side filter that's no longer guaranteed — accept either path (callback fired *or* notifier dropped) as valid cancellation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/distributed-limit-actor
12 hours ago
fix(flotilla): drive LimitNode cancellation from actor's contributor set `StreamingSinkOutput::Finished` from `DistributedLimitSink` terminated the entire streaming-sink node (see base.rs:326), killing the cached pipeline on a flotilla worker. Other SwordfishTasks sharing the cached pipeline as separate input_ids died with "Plan execution task has died", silently truncating queries like `df.filter(...).limit(N).count_rows()` and `df.limit(N).into_batches(M)`. Polling `actor.is_done()` after each notify also conflated "limit reached" with "safe to cancel". Cancelling the moment the actor reports done could kill an in-flight contributor whose data wasn't yet materialized, losing limit rows. Rework: - Sink always returns `NeedMoreInput`. The cached pipeline stays alive across input_ids; per-input streams drain naturally via finalize. - Actor exposes `wait_for_contributors()`: awaits `is_done`, then returns the input_ids that consumed budget (`take > 0`). - Notify token payload changes from `usize` (row count) to `TaskID`, so the LimitNode learns *which* tasks completed. - LimitNode awaits `wait_for_contributors` and the notify_tokens; it only cancels `parent_cancel` once every contributing input_id has appeared in `completed_ids`. The scheduler's is_cancelled filter then drops pending tasks; in-flight tasks that get killed are non-contributors (their data is 0), so cancellation never loses limit rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/distributed-limit-actor
12 hours ago
fix(flotilla): only cancel forwarded tasks when LimitNode hits the limit When a downstream node batches its input — e.g. `IntoPartitionsNode` used by `df.into_batches(...)` — forwarded builders are buffered and only submitted after the LimitNode's `result_tx` closes. Cancelling `parent_cancel` on the natural-exhaustion exit path would then drop those just-submitted builders via the scheduler's is_cancelled filter, silently returning zero rows from `df.limit(N).into_batches(M)` and similar chains. Only cancel when the loop broke because the actor reported `is_done`. Natural exhaustion of the input stream means everything was already forwarded for legitimate processing; no cancellation needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/distributed-limit-actor
14 hours ago
style(flotilla): use `Option::unwrap_or_default` for CancellationToken clippy::unwrap-or-default fires on `self.cancel_token.unwrap_or_else(CancellationToken::new)`. `CancellationToken` already implements `Default` as a fresh independent token; switch to `unwrap_or_default()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
colin/distributed-limit-actor
14 hours ago
fix(json): address greptile review feedback Three review comments from greptile-apps on PR: 1. P1 json_tuple: reject duplicate field names. Calling json_tuple(expr, "a", "a") used to silently produce a Struct with two "a" fields, which caused .get("a") to return only the first match and broke serialization. We now validate uniqueness in both extract_keys_from_* paths and raise a clear ValueError at planning time. 2. P2 json_object_keys: document that key order is alphabetical. serde_json's Value::Object is backed by BTreeMap (preserve_order is not enabled), so keys come back sorted. Spark preserves insertion order. The Rust doc-comment, the SQL docstring, and the Python wrapper docstring now all explicitly call out this difference. 3. P2 json_tuple: propagate row-level nullability to the struct. Previously the result Struct was constructed with nulls=None, so every row was struct-level non-null even when the input was NULL / malformed / not an object; df["t"].is_null() always returned False on those rows. We now build a per-row validity NullBuffer and pass it to StructArray::new so is_null() correctly reflects bad inputs. Missing-key cases keep producing only field-level NULLs (the row stays valid). Tests: - New test_json_tuple_rejects_duplicate_field_names. - test_json_tuple_invalid_and_null extended to assert is_null() == True on the bad rows and to read back the struct itself. - All 17 json tests + 36 doctests pass.
XuQianJin-Stars:feat/json-functions-array-length-object-keys-tuple
15 hours ago
Latest Branches
CodSpeed Performance Gauge
-1%
feat(flotilla): Distributed Limit Counter
#6942
5 hours ago
3a3f04e
colin/distributed-limit-actor
CodSpeed Performance Gauge
-1%
feat: ASOF join benchmarking scripts
#6940
5 hours ago
b267696
feat/asof-benchmarks
CodSpeed Performance Gauge
-1%
fix(lance): drop dead floor-division of num_groups in distribute_fragments_balanced
#6946
8 hours ago
f562930
XuQianJin-Stars:fix/lance-utils-redundant-num-groups
© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs