Avatar for the pola-rs user
pola-rs
polars
BlogDocsChangelog

perf: Integer fast path Parquet dict encoding

#18030Merged
Comparing
coastalwhite:perf-parquet-write-dictionary
(
2294531
) with
main
(
d5265d3
)
CodSpeed Performance Gauge
-1%
Improvements
0
Regressions
0
Untouched
37
New
0
Dropped
0
Ignored
1

Benchmarks

Passed

test_tpch_q22
py-polars/tests/benchmark/test_tpch.py::test_tpch_q22
CodSpeed Performance Gauge
+15%
7.4 ms
6.5 ms
test_tpch_q6
py-polars/tests/benchmark/test_tpch.py::test_tpch_q6
CodSpeed Performance Gauge
+10%
2.1 ms
1.9 ms
test_tpch_q3
py-polars/tests/benchmark/test_tpch.py::test_tpch_q3
CodSpeed Performance Gauge
+9%
6.1 ms
5.6 ms
test_tpch_q7
py-polars/tests/benchmark/test_tpch.py::test_tpch_q7
CodSpeed Performance Gauge
+6%
9.5 ms
9 ms
test_groupby_h2oai_q1
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q1
CodSpeed Performance Gauge
+5%
2.4 ms
2.3 ms
test_tpch_q4
py-polars/tests/benchmark/test_tpch.py::test_tpch_q4
CodSpeed Performance Gauge
+2%
4.7 ms
4.6 ms
test_groupby_h2oai_q2
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q2
CodSpeed Performance Gauge
+2%
4.5 ms
4.4 ms
test_tpch_q2
py-polars/tests/benchmark/test_tpch.py::test_tpch_q2
CodSpeed Performance Gauge
+2%
3.7 ms
3.6 ms
test_groupby_h2oai_q9
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q9
CodSpeed Performance Gauge
+2%
27.1 ms
26.6 ms
test_write_read_scan_large_csv
py-polars/tests/benchmark/test_io.py::test_write_read_scan_large_csv
CodSpeed Performance Gauge
+1%
33.6 ms
33.1 ms
test_tpch_q20
py-polars/tests/benchmark/test_tpch.py::test_tpch_q20
CodSpeed Performance Gauge
+1%
6.3 ms
6.3 ms
test_tpch_q12
py-polars/tests/benchmark/test_tpch.py::test_tpch_q12
CodSpeed Performance Gauge
+1%
6.2 ms
6.2 ms
test_groupby_h2oai_q8
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q8
CodSpeed Performance Gauge
+1%
2.8 ms
2.8 ms
test_tpch_q19
py-polars/tests/benchmark/test_tpch.py::test_tpch_q19
CodSpeed Performance Gauge
0%
7.3 ms
7.3 ms
test_tpch_q21
py-polars/tests/benchmark/test_tpch.py::test_tpch_q21
CodSpeed Performance Gauge
0%
216.9 ms
216.8 ms
test_to_numpy_series_with_nulls
py-polars/tests/benchmark/interop/test_numpy.py::test_to_numpy_series_with_nulls
CodSpeed Performance Gauge
0%
310.7 µs
310.7 µs
test_to_numpy_series_chunked
py-polars/tests/benchmark/interop/test_numpy.py::test_to_numpy_series_chunked
CodSpeed Performance Gauge
0%
274.8 µs
275.2 µs
test_to_numpy_series_zero_copy
py-polars/tests/benchmark/interop/test_numpy.py::test_to_numpy_series_zero_copy
CodSpeed Performance Gauge
0%
127.1 µs
127.4 µs
test_tpch_q9
py-polars/tests/benchmark/test_tpch.py::test_tpch_q9
CodSpeed Performance Gauge
0%
25.1 ms
25.2 ms
test_tpch_q17
py-polars/tests/benchmark/test_tpch.py::test_tpch_q17
CodSpeed Performance Gauge
-1%
6.7 ms
6.7 ms
test_groupby_h2oai_q10
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q10
CodSpeed Performance Gauge
-1%
6.2 ms
6.3 ms
test_groupby_h2oai_q6
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q6
CodSpeed Performance Gauge
-1%
13.1 ms
13.3 ms
test_tpch_q13
py-polars/tests/benchmark/test_tpch.py::test_tpch_q13
CodSpeed Performance Gauge
-3%
9.9 ms
10.2 ms
test_groupby_h2oai_q4
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q4
CodSpeed Performance Gauge
-3%
1.8 ms
1.9 ms
test_filter1
py-polars/tests/benchmark/test_filter.py::test_filter1
CodSpeed Performance Gauge
-3%
719.9 µs
744.2 µs
test_tpch_q8
py-polars/tests/benchmark/test_tpch.py::test_tpch_q8
CodSpeed Performance Gauge
-3%
5.4 ms
5.6 ms
test_tpch_q16
py-polars/tests/benchmark/test_tpch.py::test_tpch_q16
CodSpeed Performance Gauge
-3%
7.3 ms
7.6 ms
test_tpch_q5
py-polars/tests/benchmark/test_tpch.py::test_tpch_q5
CodSpeed Performance Gauge
-5%
5 ms
5.3 ms
test_groupby_h2oai_q7
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q7
CodSpeed Performance Gauge
-6%
2.1 ms
2.3 ms
test_tpch_q1
py-polars/tests/benchmark/test_tpch.py::test_tpch_q1
CodSpeed Performance Gauge
-7%
16 ms
17.2 ms
test_tpch_q15
py-polars/tests/benchmark/test_tpch.py::test_tpch_q15
CodSpeed Performance Gauge
-7%
2.5 ms
2.7 ms
test_tpch_q10
py-polars/tests/benchmark/test_tpch.py::test_tpch_q10
CodSpeed Performance Gauge
-8%
6.1 ms
6.6 ms
test_tpch_q14
py-polars/tests/benchmark/test_tpch.py::test_tpch_q14
CodSpeed Performance Gauge
-8%
2 ms
2.2 ms
test_tpch_q18
py-polars/tests/benchmark/test_tpch.py::test_tpch_q18
CodSpeed Performance Gauge
-8%
11.3 ms
12.3 ms
test_groupby_h2oai_q3
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q3
CodSpeed Performance Gauge
-9%
2.3 ms
2.5 ms
test_tpch_q11
py-polars/tests/benchmark/test_tpch.py::test_tpch_q11
CodSpeed Performance Gauge
-11%
3.8 ms
4.3 ms
test_groupby_h2oai_q5
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q5
CodSpeed Performance Gauge
-13%
1.8 ms
2 ms

Ignored

test_filter2Ignored
py-polars/tests/benchmark/test_filter.py::test_filter2
CodSpeed Performance Gauge
+2%
1.1 ms
1.1 ms

Commits

Click on a commit to change the comparison range
Base
main
d5265d3
-1%
perf: integer fast path Parquet dict encoding This adds a fast path for the slowest part of parquet writing which is (attemping) the dictionary encoding. Normally this involves a lot of hashing which is quite slow. With this PR, it will first look at the minimum and maximum values for integers. If these are sufficiently close together, we create the dictionary using those values.
8262bb1
5 months ago
by coastalwhite
-1%
fix: integer overflow
2294531
5 months ago
by coastalwhite
Home Terms PrivacyDocs