Eventual-Inc
Daft
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
fixes
chris/event-log
47 minutes ago
Merge remote-tracking branch 'origin/main' into rchowell/sources-1-of-N
rchowell/sources-1-of-N
3 hours ago
fix(parquet): add MIN_RG_BYTES gate to async paths, fix stale comments Add MIN_RG_BYTES_FOR_COL_PARALLELISM check to both async column-parallel paths (read_parquet_single_arrowrs, stream_parquet_single_arrowrs) to match the sync path behavior. Fix stale comments that referenced wrong thresholds and replace remaining magic number 3 with constant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
desmond/intra-rg-col-parallelism-v2
3 hours ago
feat(expressions): support count(mode='all') without expr (#6358) ## Changes Made This PR adds support for calling `daft.functions.count(mode="all")` without passing an expression, enabling free-standing row-count aggregations similar to SQL `COUNT(*)`. ### What changed - Updated `daft.functions.count` in `daft/functions/agg.py` to accept an optional `expr` argument. - Added explicit behavior for `expr=None`: - `mode="all"` is allowed and maps to `count(*)` semantics via `col("*")`. - `mode="valid"` or `mode="null"` now raises a clear `ValueError` when no expression is provided. - Preserved backward compatibility for all existing `count(expr, mode=...)` and `count(lit(...))` usages. ### Tests added In `tests/dataframe/test_aggregations.py`: - Global aggregation with `count(mode="all")` and no expression. - Grouped aggregation with `count(mode="all")` and no expression. - Empty DataFrame behavior (`0` result). - Error path validation for no-expression calls with non-`all` modes. ### Validation run - `DAFT_RUNNER=native make test EXTRA_ARGS="-v tests/dataframe/test_aggregations.py"` - `DAFT_RUNNER=native make test EXTRA_ARGS="-v tests/sql/test_sql.py tests/sql/test_exprs.py"` ## Related Issues Closes #5526 Co-authored-by: Lord of Abyss <pancx@chinatelecom.cn>
main
3 hours ago
feat: kafka bounded datasource (#5970) ## Changes Made Added a bounded Kafka batch read API via `daft.read_kafka` , supporting `start / end` bounds expressed as: - `"earliest"` / `"latest"` - timestamp_ms (int), datetime , and ISO-8601 strings - per-partition offset maps ( `{partition: offset}` for single-topic; `{topic: {partition: offset}}` for multi-topic) <!-- Describe what changes were made and why. Include implementation details if necessary. --> ## Related Issues Closes #4603 <!-- Link to related GitHub issues, e.g., "Closes #123" --> --------- Co-authored-by: wangzheyan <wangzheyan@bytedance.com>
main
4 hours ago
feat(udf): support ray_options and resource overrides in UDF v2 (#5982) ## Changes Made This PR adds support for passing `ray_options` and overriding resource requirements in Daft UDF v2 (`@daft.func` and `@daft.cls`). Key changes: - Update `row_wise_udf` and `batch_udf` bindings to accept `ray_options` and `cpus`. - Add `override_options` and `with_concurrency` methods to `Func` class in `daft/udf/udf_v2.py` for dynamic configuration. - Propagate these options from Python to the Rust Logical Plan. - Update type stubs (`.pyi`) to match new Rust signatures. <!-- Describe what changes were made and why. Include implementation details if necessary. --> ## Related Issues <!-- Link to related GitHub issues, e.g., "Closes #123" -->
main
5 hours ago
docs for read_kafka
everySympathy:kafka-bounded-read
6 hours ago
feat: google text embedder
Farzan-Hashmi:farzan/google-text-embedder
10 hours ago
Active Branches
feat(subscribers): add JSONL event log subscriber
#6420
last run
47 minutes ago
CodSpeed Performance Gauge
0%
feat: define a single DataSource trait matching our python interface.
#6427
last run
3 hours ago
CodSpeed Performance Gauge
0%
perf(parquet): add intra-row-group column parallelism to arrow-rs reader
#6423
last run
3 hours ago
CodSpeed Performance Gauge
0%
Ā© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs