Commits
Click on a commit to change the comparison rangefeat: Add BFloat16 data type support backed by Float32 physical storage
This commit introduces native support for the BFloat16 data type in Daft, addressing performance bottlenecks associated with using Python objects for ML tensors.
Key features:
1. **Logical BFloat16 Type**: Adds `DataType.bfloat16()` which behaves logically as a 16-bit brain floating point number.
2. **Float32 Physical Storage**: Uses 32-bit floats for underlying storage. This ensures zero precision loss (as BF16 is a truncated FP32) while leveraging existing vectorized Float32 kernels for high performance.
3. **Seamless Interop**:
- Supports ingestion from `torch.bfloat16` tensors and `ml_dtypes.bfloat16` numpy arrays.
- `to_pylist()` reconstructs `torch.bfloat16` tensors (if torch is available) or returns float32 numpy arrays, preserving type fidelity.
- Integrates with `jaxtyping` for type inference.
4. **Arrow Compatibility**: Implements Arrow Extension Type (`daft.bfloat16`) for serialization and interoperability.
This implementation eliminates the overhead of `DataType.Python` for BF16 data, significantly improving memory usage and processing speed for ML workloads.