docs(skills): add ClaudeCODE-compatible guides for distributed scaling, UDF tuning, and docs navigation
This commit adds a new top-level `skills/` directory with ClaudeCODE-style guidance docs intended for AI coding assistants and Daft users.
Included documents:
- `skills/README.md`: index and usage guidance for the skills folder
- `skills/distributed-scaling.md`: recipes for converting single-node workflows to distributed execution, focusing on Ray, repartitioning, and batching
- `skills/udf-tuning.md`: overview and tuning advice for legacy `@daft.udf` and new `@daft.func` / `@daft.cls` APIs, including resource utilization strategies
- `skills/docs-navigation.md`: tips for navigating Daft docs (docs.daft.ai and the repo `docs/` tree) from terminal- and agent-driven workflows
The docs favor concise, terminal-friendly examples and include links back to the relevant sections of the official Daft documentation.
Co-Authored-By: Aime <aime@bytedance.com>
Change-Id: I3a984acbf46792b1b9ed1b06fbdb240e47114d37
fix: Allow `is_in` to accept sets, tuples, and other iterables
Previously, `is_in` only accepted lists. Now it accepts any iterable
(sets, tuples, frozensets, ranges, generators, dict_keys, etc.)
by converting them to lists before processing.
Strings and bytes are explicitly excluded since they are iterable but
should not be treated as a sequence of characters/bytes.
Fixes #6107
Slack thread: https://eventualgroup.slack.com/archives/C07T76NL6TY/p1769900272580499?thread_ts=1769890147.586419&cid=C07T76NL6TY
https://claude.ai/code/session_014EPq2BXs5xxLoTjmAEbTym
ci: Add pyarrow to wheel build test dependencies
The tests in tests/dataframe import pyarrow in multiple files, but it
wasn't being installed alongside the other test dependencies (ray,
pytest, pandas, pytz, numpy). This was causing test failures in the
nightly S3 publish workflow.
Slack thread: https://eventualgroup.slack.com/archives/C04S6C067EU/p1769935300945999?thread_ts=1769928453.659629&cid=C04S6C067EU
https://claude.ai/code/session_01AQAqWAAYWJqsNFmyuQmjDB