Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

Performance History

Latest Results

refactored out drop
euan/optimize-window-fn
52 minutes ago
refactor out needless collect
euan/optimize-window-fn-2
3 hours ago
fix: update code for nightly-2026-01-08 toolchain Bump rust nightly toolchain from nightly-2025-09-03 to nightly-2026-01-08. This picks up the ArrayChunks::into_remainder() return type change (rust-lang/rust#149127) while staying before the cargo --timings=html removal (rust-lang/cargo#16420), which CI still relies on. Changes to accommodate the new nightly: - Fix into_remainder() call in daft-minhash to match new return type (no longer wrapped in Option) (This enables distributions to build with Rust 1.93 stable and newer when RUSTC_BOOTSTRAP is set.) - Upgrade cargo-llvm-cov from 0.7.1 to 0.8.7 to fix corrupt profraw files due to LLVM profdata format incompatibility Clippy lint fixes for newer nightly: - Fix use_self by replacing self-referential type names with Self across multiple enum/struct definitions - Allow clippy::use_self on snafu-derived Error variants that use Arc<Error> (snafu macro generates code with concrete type names, incompatible with Self): daft-io CachedError, daft-parquet RemoteFetchFailed - Allow clippy::result_large_err in daft-io/src/tos.rs (IO error types are inherently large) - Rename clippy::only_used_in_recursion allow attributes to clippy::self_only_used_in_recursion (lint renamed in newer clippy) - Fix ref_as_ptr by using std::ptr::from_ref - Fix unnecessary_unwrap in daft-sql by using if-let - Fix unchecked_time_subtraction by using checked_sub().unwrap() - Fix useless_vec in daft-io test - Fix derivable_impls on PreviewFormat by using #[derive(Default)] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mikedep333:info_remainder-new-rust
4 hours ago
fix: update code for nightly-2026-01-08 toolchain Bump rust nightly toolchain from nightly-2025-09-03 to nightly-2026-01-08. This picks up the ArrayChunks::into_remainder() return type change (rust-lang/rust#149127) while staying before the cargo --timings=html removal (rust-lang/cargo#16420), which CI still relies on. Changes to accommodate the new nightly: - Fix into_remainder() call in daft-minhash to match new return type (no longer wrapped in Option) (This enables distributions to build with Rust 1.93 stable and newer when RUSTC_BOOTSTRAP is set.) - Upgrade cargo-llvm-cov from 0.7.1 to 0.8.7 to fix corrupt profraw files due to LLVM profdata format incompatibility Clippy lint fixes for newer nightly: - Fix use_self by replacing self-referential type names with Self across multiple enum/struct definitions - Allow clippy::use_self on snafu-derived Error variants that use Arc<Error> (snafu macro generates code with concrete type names, incompatible with Self): daft-io CachedError, daft-parquet RemoteFetchFailed - Allow clippy::result_large_err in daft-io/src/tos.rs (IO error types are inherently large) - Rename clippy::only_used_in_recursion allow attributes to clippy::self_only_used_in_recursion (lint renamed in newer clippy) - Fix ref_as_ptr by using std::ptr::from_ref - Fix unnecessary_unwrap in daft-sql by using if-let - Fix unchecked_time_subtraction by using checked_sub().unwrap() - Fix useless_vec in daft-io test - Fix derivable_impls on PreviewFormat by using #[derive(Default)] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mikedep333:info_remainder-new-rust
5 hours ago
fix: update code for nightly-2026-01-08 toolchain Bump rust nightly toolchain from nightly-2025-09-03 to nightly-2026-01-08. This picks up the ArrayChunks::into_remainder() return type change (rust-lang/rust#149127) while staying before the cargo --timings=html removal (rust-lang/cargo#16420), which CI still relies on. Changes to accommodate the new nightly: - Fix into_remainder() call in daft-minhash to match new return type (no longer wrapped in Option) (This enables distributions to build with Rust 1.93 stable and newer when RUSTC_BOOTSTRAP is set.) - Upgrade cargo-llvm-cov from 0.7.1 to 0.8.7 to fix corrupt profraw files due to LLVM profdata format incompatibility - Fix clippy use_self lint errors by replacing self-referential type names with Self in DataType, ArrowSchema, ArrowArray, ArrowArrayStream, Literal, FakeSchema, FakeArray, Value, PlanJsonConfig, JoinOrderTree, and Error enum/struct definitions - Allow clippy::use_self on snafu-derived Error variants that use Arc<Error> (snafu macro generates code with concrete type names, incompatible with Self): daft-io CachedError, daft-parquet RemoteFetchFailed - Allow clippy::result_large_err in daft-io/src/tos.rs (IO error types are inherently large) - Rename clippy::only_used_in_recursion allow attributes to clippy::self_only_used_in_recursion (lint renamed in newer clippy) - Fix ref_as_ptr lint in daft-recordbatch test by using std::ptr::from_ref - Fix useless_vec lint in daft-io test - Fix derivable_impls lint on PreviewFormat by using #[derive(Default)] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mikedep333:info_remainder-new-rust
5 hours ago
feat(functions): add video_frames_from_bytes for row-level decoding from a binary column Adds `daft.functions.video_frames_from_bytes`, a row-level expression that decodes frames straight from a Binary column of encoded video bytes. Useful when the encoded video is already in memory (custom downloader UDFs, bytes streaming through other operators) and you don't want to round-trip through a path. Returns the same per-frame Struct schema as `video_frames`. Behavior notes: - Null inputs are mapped to an empty frame list rather than raised. This lets a sibling expression branch on the original null column (typically populating an `extract_error` column via `when(col.is_null(), ...)`) without aborting the whole batch when an upstream download UDF signalled failure with `None`. - The BytesIO buffer wrapping each row's bytes is explicitly closed after decoding. PyAV's `container.__exit__` does not close the underlying file-like; without an explicit close the raw bytes (often hundreds of MB per row) stay pinned until the next GC pass — costly inside long-running Ray actor processes. - Eager input validation matches `video_frames`: Pillow availability, width/height pairing, positive `sample_interval_seconds`. Refactor: the per-container iteration loop in `VideoFile.frames` is extracted into a module-level `_iter_frames_from_container` helper that both the file and bytes paths share. The return-dtype of the per-frame struct is also hoisted into a `_VIDEO_FRAMES_RETURN_DTYPE` module constant so both Funcs share a single source of truth. 5 new tests (bytes happy path, sample_interval interaction, resize, keyframe filtering, null-input → empty list) and a docs example in `docs/modalities/videos.md`.
TheR1sing3un:feat_video_frames_from_bytes
6 hours ago
feat: nearest asof joins (#6953) --- Nearest ASOF Join Adds strategy="nearest" to join_asof, which matches each left row to the right row with the minimum absolute difference in the on-key. Ties prefer the larger (later/forward) value. --- Changes: Native execution **Probe** Previously, we assign each right row to exactly one left row and rely on a single directional fill to propagate matches. Nearest can't do this, when two left rows are equidistant from a right row, assigning to only the nearest means the other never gets to compare that candidate. The fix is search_bucket_nearest_range: for each right row it returns a Range<usize> covering both the floor (last left ≤ right) and ceil (first left ≥ right). Every right row is offered to every position in the range via update_nearest_match, which keeps the closer candidate. **Finalize** After per-worker probe states are merged, nearest_fill resolves unmatched left rows by running both a forward and backward fill on copies of global_best, then picking the closer candidate from the two directions using is_nearer. **is_nearer** For each comparison, dispatches on the Arrow DataType once, downcasts to the concrete PrimitiveArray<T>, then extracts three plain Rust scalars (candidate A, candidate B, pivot) and computes |a - pivot| vs |b - pivot| Chose this approach as type-matching and computing distances as plain Rust scalars was faster than going through Arrow compute kernels or Daft Series/array primitives --- Changes: Distributed execution We refactored the carryover computation into compute_carryovers(descending: bool), which runs a top_n(limit=1) pass over the right table: - descending=true picks the per-partition max and propagates it left→right, giving each partition the closest right row from behind; - descending=false picks the per-partition min and propagates right→left, giving each partition the closest right row from ahead. For Nearest, both passes run concurrently via tokio::try_join!. Each partition's local join task then receives its own right data plus one boundary row from each direction (two carryovers), and the native nearest join handles the rest.
main
6 hours ago
fix: update code for nightly-2026-01-08 toolchain Bump rust nightly toolchain from nightly-2025-09-03 to nightly-2026-01-08. This picks up the ArrayChunks::into_remainder() return type change (rust-lang/rust#149127) while staying before the cargo --timings=html removal (rust-lang/cargo#16420), which CI still relies on. Changes to accommodate the new nightly: - Fix into_remainder() call in daft-minhash to match new return type (no longer wrapped in Option) (This enables distributions to build with Rust 1.93 stable and newer when RUSTC_BOOTSTRAP is set.) - Upgrade cargo-llvm-cov from 0.7.1 to 0.8.7 to fix corrupt profraw files due to LLVM profdata format incompatibility - Fix clippy use_self lint errors by replacing self-referential type names with Self in DataType, ArrowSchema, ArrowArray, ArrowArrayStream, Literal, FakeSchema, FakeArray, Value, PlanJsonConfig, and Error enum/struct definitions - Allow clippy::use_self on snafu-derived Error variants that use Arc<Error> (snafu macro generates code with concrete type names, incompatible with Self): daft-io CachedError, daft-parquet RemoteFetchFailed - Allow clippy::result_large_err in daft-io/src/tos.rs (IO error types are inherently large) - Allow clippy::self_only_used_in_recursion on HuggingFace request method (lint renamed from only_used_in_recursion in newer clippy) - Fix useless_vec lint in daft-io test - Fix derivable_impls lint on PreviewFormat by using #[derive(Default)] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
mikedep333:info_remainder-new-rust
7 hours ago

Latest Branches

CodSpeed Performance Gauge
0%
perf(window): reduce unnecessary copies of data in finalize() step of window functions to reduce memory usage#7006
2 hours ago
3118705
euan/optimize-window-fn
CodSpeed Performance Gauge
0%
CodSpeed Performance Gauge
-1%
6 days ago
848e8d3
mikedep333:info_remainder-new-rust
© 2026 CodSpeed Technology
Home Terms Privacy Docs