Avatar for the Eventual-Inc user
Eventual-Inc
Daft
BlogDocsChangelog

feat: Split *all* Parquet ScanTasks by default

#3454
Comparing
jay/split-all-files
(
bba2bed
) with
main
(
063de4d
)
CodSpeed Performance Gauge
-32%
Improvements
0
Regressions
1
Untouched
26
New
0
Dropped
0
Ignored
1

Benchmarks

Failed

test_show[100 Small Files]Regression
tests/benchmarks/test_interactive_reads.py::test_show[100 Small Files]
CodSpeed Performance Gauge
-32%
15.8 ms
23.3 ms

Passed

test_show[1 Small File]
tests/benchmarks/test_interactive_reads.py::test_show[1 Small File]
CodSpeed Performance Gauge
+2%
11.6 ms
11.5 ms
test_tpch[1-in-memory-native-8]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-8]
CodSpeed Performance Gauge
+1%
185.1 ms
183.5 ms
test_tpch_sql[1-in-memory-native-4]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-4]
CodSpeed Performance Gauge
+1%
158.5 ms
157.3 ms
test_tpch_sql[1-in-memory-native-10]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-10]
CodSpeed Performance Gauge
+1%
245.3 ms
243.4 ms
test_tpch_sql[1-in-memory-native-1]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-1]
CodSpeed Performance Gauge
+1%
493.3 ms
490.4 ms
test_tpch[1-in-memory-native-2]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-2]
CodSpeed Performance Gauge
+1%
114.3 ms
113.7 ms
test_tpch[1-in-memory-native-7]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-7]
CodSpeed Performance Gauge
+1%
153 ms
152.1 ms
test_tpch_sql[1-in-memory-native-2]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-2]
CodSpeed Performance Gauge
+1%
240.4 ms
239.1 ms
test_tpch[1-in-memory-native-4]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-4]
CodSpeed Performance Gauge
0%
155.3 ms
154.7 ms
test_tpch[1-in-memory-native-1]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-1]
CodSpeed Performance Gauge
0%
466.1 ms
465.1 ms
test_tpch[1-in-memory-native-6]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-6]
CodSpeed Performance Gauge
0%
29.1 ms
29 ms
test_count[100 Small Files]
tests/benchmarks/test_interactive_reads.py::test_count[100 Small Files]
CodSpeed Performance Gauge
0%
72.7 ms
72.6 ms
test_tpch[1-in-memory-native-5]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-5]
CodSpeed Performance Gauge
0%
335.4 ms
334.8 ms
test_iter_rows_first_row[1 Small File]
tests/benchmarks/test_interactive_reads.py::test_iter_rows_first_row[1 Small File]
CodSpeed Performance Gauge
0%
101.2 ms
101.1 ms
test_tpch_sql[1-in-memory-native-6]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-6]
CodSpeed Performance Gauge
0%
29.8 ms
29.8 ms
test_tpch_sql[1-in-memory-native-3]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-3]
CodSpeed Performance Gauge
0%
153.8 ms
154 ms
test_tpch_sql[1-in-memory-native-5]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-5]
CodSpeed Performance Gauge
0%
263.4 ms
263.9 ms
test_tpch[1-in-memory-native-3]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-3]
CodSpeed Performance Gauge
-1%
151.9 ms
152.9 ms
test_explain[100 Small Files]
tests/benchmarks/test_interactive_reads.py::test_explain[100 Small Files]
CodSpeed Performance Gauge
-1%
6 ms
6 ms
test_tpch[1-in-memory-native-10]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-10]
CodSpeed Performance Gauge
-1%
231.7 ms
234.5 ms
test_tpch_sql[1-in-memory-native-7]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-7]
CodSpeed Performance Gauge
-2%
1.2 s
1.2 s
test_tpch[1-in-memory-native-9]
tests/benchmarks/test_local_tpch.py::test_tpch[1-in-memory-native-9]
CodSpeed Performance Gauge
-2%
485.5 ms
493.2 ms
test_tpch_sql[1-in-memory-native-9]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-9]
CodSpeed Performance Gauge
-2%
516.7 ms
527.3 ms
test_tpch_sql[1-in-memory-native-8]
tests/benchmarks/test_local_tpch.py::test_tpch_sql[1-in-memory-native-8]
CodSpeed Performance Gauge
-2%
197.4 ms
202.2 ms
test_count[1 Small File]
tests/benchmarks/test_interactive_reads.py::test_count[1 Small File]
CodSpeed Performance Gauge
-4%
3.5 ms
3.7 ms
test_iter_rows_first_row[100 Small Files]
tests/benchmarks/test_interactive_reads.py::test_iter_rows_first_row[100 Small Files]
CodSpeed Performance Gauge
-7%
304.4 ms
327.3 ms

Ignored

test_explain[1 Small File]Ignored
tests/benchmarks/test_interactive_reads.py::test_explain[1 Small File]
CodSpeed Performance Gauge
-3%
1.8 ms
1.8 ms

Commits

Click on a commit to change the comparison range
Base
main
063de4d
-32%
Perform split on all files Refactor into accumulator struct Rename Further simplification of accumulator logic Cleanup into separate accumulator and accumulator context Account for potentially null TableMetadata Refactor into Iterator Refactor into state machine Convert Parquet file iterator to state machine as well small cleanup Reorganization into a separate module Cleanup to extend this easier for using catalog information Perform 16 Parquet metadata fetches in parallel perf: reduce calls to ScanTask::estimate_in_memory_size Adds unit test Adds more unit tests Add feature flag DAFT_ENABLE_AGGRESSIVE_SCANTASK_SPLITTING Add a benchmarking script Trigger data materialization in benchmark Refactors to ParquetFileSplitter to not use state machine Big refactor to split into multiple files and iterators Add better docs Refactor splitter code nit naming Refactor Fetchable reordering for readability Simplify State logic for FetchParquetMetadataByWindows impl IntoIterator for SplittableScanTaskRef by propagating the config ref docstrings Removed advance_state for more explicit naming Remove trait
bba2bed
1 month ago
Home Terms PrivacyDocs