Avatar for the pola-rs user
pola-rs
polars
BlogDocsChangelog

feat: Conserve Parquet `SortingColumns` for ints

#19251Merged
Comparing
coastalwhite:feat/pq-conserve-sortingcolumns
(
9696e5b
) with
main
(
21dc469
)
CodSpeed Performance Gauge
-1%
Improvements
0
Regressions
0
Untouched
41
New
0
Dropped
0
Ignored
1

Benchmarks

Passed

test_groupby_h2oai_q3
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q3
CodSpeed Performance Gauge
+10%
2.6 ms
2.4 ms
test_pdsh_q11
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q11
CodSpeed Performance Gauge
+10%
4.4 ms
4 ms
test_pdsh_q22
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q22
CodSpeed Performance Gauge
+3%
7.3 ms
7.1 ms
test_groupby_h2oai_q8
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q8
CodSpeed Performance Gauge
+3%
3.2 ms
3.1 ms
test_pdsh_q16
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q16
CodSpeed Performance Gauge
+2%
7.4 ms
7.3 ms
test_pdsh_q1
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q1
CodSpeed Performance Gauge
+2%
16.8 ms
16.5 ms
test_pdsh_q13
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q13
CodSpeed Performance Gauge
+2%
9.9 ms
9.8 ms
test_datetime_range_fast_slow_paths
py-polars/tests/unit/functions/range/test_datetime_range.py::test_datetime_range_fast_slow_paths
CodSpeed Performance Gauge
+1%
355.6 ms
350.7 ms
test_pdsh_q18
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q18
CodSpeed Performance Gauge
+1%
11.8 ms
11.6 ms
test_pdsh_q17
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q17
CodSpeed Performance Gauge
+1%
6.8 ms
6.8 ms
test_groupby_h2oai_q6
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q6
CodSpeed Performance Gauge
+1%
12.6 ms
12.5 ms
test_pdsh_q8
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q8
CodSpeed Performance Gauge
+1%
5.5 ms
5.4 ms
test_groupby_h2oai_q10
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q10
CodSpeed Performance Gauge
+1%
6.3 ms
6.3 ms
test_pdsh_q21
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q21
CodSpeed Performance Gauge
0%
213.3 ms
212.8 ms
test_strict_inequalities
py-polars/tests/benchmark/test_join_where.py::test_strict_inequalities
CodSpeed Performance Gauge
0%
168.7 ms
168.3 ms
test_pdsh_q15
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q15
CodSpeed Performance Gauge
0%
2.5 ms
2.5 ms
test_pdsh_q6
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q6
CodSpeed Performance Gauge
0%
1.9 ms
1.9 ms
test_pdsh_q19
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q19
CodSpeed Performance Gauge
0%
7.4 ms
7.4 ms
test_pdsh_q14
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q14
CodSpeed Performance Gauge
0%
2.1 ms
2.1 ms
test_to_numpy_series_with_nulls
py-polars/tests/benchmark/interop/test_numpy.py::test_to_numpy_series_with_nulls
CodSpeed Performance Gauge
0%
434.8 µs
434.8 µs
test_groupby_h2oai_q9
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q9
CodSpeed Performance Gauge
0%
27.5 ms
27.6 ms
test_to_numpy_series_zero_copy
py-polars/tests/benchmark/interop/test_numpy.py::test_to_numpy_series_zero_copy
CodSpeed Performance Gauge
0%
123.1 µs
123.2 µs
test_groupby_h2oai_q4
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q4
CodSpeed Performance Gauge
0%
2.2 ms
2.2 ms
test_to_numpy_series_chunked
py-polars/tests/benchmark/interop/test_numpy.py::test_to_numpy_series_chunked
CodSpeed Performance Gauge
0%
269.5 µs
269.7 µs
test_non_strict_inequalities
py-polars/tests/benchmark/test_join_where.py::test_non_strict_inequalities
CodSpeed Performance Gauge
0%
174.5 ms
174.9 ms
test_pdsh_q12
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q12
CodSpeed Performance Gauge
0%
6.2 ms
6.2 ms
test_single_inequality
py-polars/tests/benchmark/test_join_where.py::test_single_inequality
CodSpeed Performance Gauge
0%
80.7 ms
81 ms
test_pdsh_q20
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q20
CodSpeed Performance Gauge
0%
6.2 ms
6.2 ms
test_groupby_h2oai_q2
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q2
CodSpeed Performance Gauge
-1%
4.5 ms
4.5 ms
test_write_read_scan_large_csv
py-polars/tests/benchmark/test_io.py::test_write_read_scan_large_csv
CodSpeed Performance Gauge
-2%
37.3 ms
38.2 ms
test_groupby_h2oai_q5
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q5
CodSpeed Performance Gauge
-3%
2.1 ms
2.2 ms
test_pdsh_q10
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q10
CodSpeed Performance Gauge
-3%
6.3 ms
6.5 ms
test_pdsh_q9
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q9
CodSpeed Performance Gauge
-3%
25.1 ms
25.9 ms
test_pdsh_q5
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q5
CodSpeed Performance Gauge
-3%
4.6 ms
4.8 ms
test_pdsh_q7
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q7
CodSpeed Performance Gauge
-4%
9.5 ms
9.9 ms
test_groupby_h2oai_q7
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q7
CodSpeed Performance Gauge
-4%
2.1 ms
2.2 ms
test_pdsh_q2
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q2
CodSpeed Performance Gauge
-5%
3.7 ms
3.9 ms
test_groupby_h2oai_q1
py-polars/tests/benchmark/test_group_by.py::test_groupby_h2oai_q1
CodSpeed Performance Gauge
-8%
2.2 ms
2.3 ms
test_pdsh_q3
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q3
CodSpeed Performance Gauge
-9%
5.8 ms
6.4 ms
test_filter1
py-polars/tests/benchmark/test_filter.py::test_filter1
CodSpeed Performance Gauge
-10%
740.3 µs
819.6 µs
test_pdsh_q4
py-polars/tests/benchmark/test_pdsh.py::test_pdsh_q4
CodSpeed Performance Gauge
-10%
4.5 ms
5 ms

Ignored

test_filter2Ignored
py-polars/tests/benchmark/test_filter.py::test_filter2
CodSpeed Performance Gauge
0%
1.1 ms
1.1 ms

Commits

Click on a commit to change the comparison range
Base
main
21dc469
-1%
feat: Conserve Parquet `SortingColumns` for ints This PR makes it so that `SortedColumns` can be used to preserve the sorted flag when reading into Polars. Currently, this is only enabled for integers as other types might require additional considerations. Enabling this feature for other types is trivial now, however. ```rust import polars as pl import pyarrow.parquet as pq import io f = io.BytesIO() df = pl.DataFrame({ "a": [1, 2, 3, 4, 5, None], "b": [1.0, 2.0, 3.0, 4.0, 5.0, None], "c": range(6), }) pq.write_table( df.to_arrow(), f, sorting_columns=[ pq.SortingColumn(0, False, False), pq.SortingColumn(1, False, False), ], ) f.seek(0) df = pl.read_parquet(f)._to_metadata(stats='sorted_asc') ``` Before: ```console shape: (3, 2) ┌─────────────┬────────────┐ │ column_name ┆ sorted_asc │ │ --- ┆ --- │ │ str ┆ bool │ ╞═════════════╪════════════╡ │ a ┆ false │ │ b ┆ false │ │ c ┆ false │ └─────────────┴────────────┘ ``` After: ```console shape: (3, 2) ┌─────────────┬────────────┐ │ column_name ┆ sorted_asc │ │ --- ┆ --- │ │ str ┆ bool │ ╞═════════════╪════════════╡ │ a ┆ true │ │ b ┆ false │ │ c ┆ false │ └─────────────┴────────────┘ ```
3cd0c62
5 months ago
by coastalwhite
0%
pyfmt
9696e5b
5 months ago
by coastalwhite
Home Terms PrivacyDocs