Eventual-Inc
Daft
BlogDocsChangelog

WIP: feat: Add BFloat16 data type support backed by Float32 physical storage

#5887Closed
Comparing
huleilei:feature/bf16-unified
(
5f4047e
) with
main
(
050d6e9
)
CodSpeed Performance Gauge
0%
Untouched
24
Ignored
4

Benchmarks

Passed

test_tpch_sql[1-in-memory-1]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
+1%
402 ms397 ms
test_count[100 Small Files]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
+1%
57.6 ms57.1 ms
test_tpch[1-in-memory-10]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
198.3 ms197.6 ms
test_tpch_sql[1-in-memory-10]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
189.7 ms189.1 ms
test_tpch[1-in-memory-7]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
126 ms125.7 ms
test_tpch[1-in-memory-2]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
57 ms56.9 ms
test_tpch_sql[1-in-memory-5]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
118.5 ms118.5 ms
test_tpch_sql[1-in-memory-7]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
118 ms118 ms
test_tpch[1-in-memory-6]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
28.5 ms28.5 ms
test_tpch_sql[1-in-memory-4]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
88.9 ms89 ms
test_tpch_sql[1-in-memory-6]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
29.1 ms29.2 ms
test_tpch[1-in-memory-5]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
129.9 ms130.2 ms
test_tpch[1-in-memory-8]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
147.7 ms148.1 ms
test_tpch_sql[1-in-memory-3]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
116.4 ms116.7 ms
test_explain[100 Small Files]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
0%
12.2 ms12.2 ms
test_tpch[1-in-memory-9]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
272.2 ms273.1 ms
test_tpch[1-in-memory-1]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
400.5 ms401.9 ms
test_tpch[1-in-memory-3]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
0%
123.7 ms124.3 ms
test_tpch_sql[1-in-memory-8]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
134.3 ms135.1 ms
test_tpch[1-in-memory-4]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
88 ms88.9 ms
test_tpch_sql[1-in-memory-2]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
164.9 ms166.5 ms
test_tpch_sql[1-in-memory-9]
tests/benchmarks/test_local_tpch.py
CodSpeed Performance Gauge
-1%
262.6 ms265.8 ms
test_iter_rows_first_row[1 Small File]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
-2%
34.8 ms35.7 ms
test_show[1 Small File]
tests/benchmarks/test_interactive_reads.py
CodSpeed Performance Gauge
-4%
12.1 ms12.6 ms

Ignored

test_count[1 Small File]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
+1%
3.5 ms3.5 ms
test_explain[1 Small File]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
+2%
2.1 ms2.1 ms
test_show[100 Small Files]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
+10%
24.8 ms22.5 ms
test_iter_rows_first_row[100 Small Files]
tests/benchmarks/test_interactive_reads.py
Ignored
CodSpeed Performance Gauge
-16%
128.9 ms154.1 ms

Commits

Click on a commit to change the comparison range
Base
main
050d6e9
-0.42%
feat: Add BFloat16 data type support backed by Float32 physical storage This commit introduces native support for the BFloat16 data type in Daft, addressing performance bottlenecks associated with using Python objects for ML tensors. Key features: 1. **Logical BFloat16 Type**: Adds `DataType.bfloat16()` which behaves logically as a 16-bit brain floating point number. 2. **Float32 Physical Storage**: Uses 32-bit floats for underlying storage. This ensures zero precision loss (as BF16 is a truncated FP32) while leveraging existing vectorized Float32 kernels for high performance. 3. **Seamless Interop**: - Supports ingestion from `torch.bfloat16` tensors and `ml_dtypes.bfloat16` numpy arrays. - `to_pylist()` reconstructs `torch.bfloat16` tensors (if torch is available) or returns float32 numpy arrays, preserving type fidelity. - Integrates with `jaxtyping` for type inference. 4. **Arrow Compatibility**: Implements Arrow Extension Type (`daft.bfloat16`) for serialization and interoperability. This implementation eliminates the overhead of `DataType.Python` for BF16 data, significantly improving memory usage and processing speed for ML workloads.
5f4047e
3 days ago
by huleilei
© 2025 CodSpeed Technology
Home Terms Privacy Docs