pola-rs
polars
BlogDocsChangelog

feat: Conserve Parquet `SortingColumns` for ints

#19251Merged
Comparing
coastalwhite:feat/pq-conserve-sortingcolumns
(
9696e5b
) with
main
(
21dc469
)
CodSpeed Performance Gauge
-1%
Untouched
41
Ignored
1

Benchmarks

Passed

test_groupby_h2oai_q3
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
+10%
2.6 ms2.4 ms
test_pdsh_q11
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+10%
4.4 ms4 ms
test_pdsh_q22
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+3%
7.3 ms7.1 ms
test_groupby_h2oai_q8
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
+3%
3.2 ms3.1 ms
test_pdsh_q16
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+2%
7.4 ms7.3 ms
test_pdsh_q1
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+2%
16.8 ms16.5 ms
test_pdsh_q13
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+2%
9.9 ms9.8 ms
test_datetime_range_fast_slow_paths
py-polars/tests/unit/functions/range/test_datetime_range.py
CodSpeed Performance Gauge
+1%
355.6 ms350.7 ms
test_pdsh_q18
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+1%
11.8 ms11.6 ms
test_pdsh_q17
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+1%
6.8 ms6.8 ms
test_groupby_h2oai_q6
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
+1%
12.6 ms12.5 ms
test_pdsh_q8
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
+1%
5.5 ms5.4 ms
test_groupby_h2oai_q10
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
+1%
6.3 ms6.3 ms
test_pdsh_q21
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
0%
213.3 ms212.8 ms
test_strict_inequalities
py-polars/tests/benchmark/test_join_where.py
CodSpeed Performance Gauge
0%
168.7 ms168.3 ms
test_pdsh_q15
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
0%
2.5 ms2.5 ms
test_pdsh_q6
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
0%
1.9 ms1.9 ms
test_pdsh_q19
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
0%
7.4 ms7.4 ms
test_pdsh_q14
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
0%
2.1 ms2.1 ms
test_to_numpy_series_with_nulls
py-polars/tests/benchmark/interop/test_numpy.py
CodSpeed Performance Gauge
0%
434.8 µs434.8 µs
test_groupby_h2oai_q9
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
0%
27.5 ms27.6 ms
test_to_numpy_series_zero_copy
py-polars/tests/benchmark/interop/test_numpy.py
CodSpeed Performance Gauge
0%
123.1 µs123.2 µs
test_groupby_h2oai_q4
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
0%
2.2 ms2.2 ms
test_to_numpy_series_chunked
py-polars/tests/benchmark/interop/test_numpy.py
CodSpeed Performance Gauge
0%
269.5 µs269.7 µs
test_non_strict_inequalities
py-polars/tests/benchmark/test_join_where.py
CodSpeed Performance Gauge
0%
174.5 ms174.9 ms
test_pdsh_q12
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
0%
6.2 ms6.2 ms
test_single_inequality
py-polars/tests/benchmark/test_join_where.py
CodSpeed Performance Gauge
0%
80.7 ms81 ms
test_pdsh_q20
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
0%
6.2 ms6.2 ms
test_groupby_h2oai_q2
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
-1%
4.5 ms4.5 ms
test_write_read_scan_large_csv
py-polars/tests/benchmark/test_io.py
CodSpeed Performance Gauge
-2%
37.3 ms38.2 ms
test_groupby_h2oai_q5
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
-3%
2.1 ms2.2 ms
test_pdsh_q10
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
-3%
6.3 ms6.5 ms
test_pdsh_q9
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
-3%
25.1 ms25.9 ms
test_pdsh_q5
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
-3%
4.6 ms4.8 ms
test_pdsh_q7
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
-4%
9.5 ms9.9 ms
test_groupby_h2oai_q7
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
-4%
2.1 ms2.2 ms
test_pdsh_q2
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
-5%
3.7 ms3.9 ms
test_groupby_h2oai_q1
py-polars/tests/benchmark/test_group_by.py
CodSpeed Performance Gauge
-8%
2.2 ms2.3 ms
test_pdsh_q3
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
-9%
5.8 ms6.4 ms
test_filter1
py-polars/tests/benchmark/test_filter.py
CodSpeed Performance Gauge
-10%
740.3 µs819.6 µs
test_pdsh_q4
py-polars/tests/benchmark/test_pdsh.py
CodSpeed Performance Gauge
-10%
4.5 ms5 ms

Ignored

test_filter2
py-polars/tests/benchmark/test_filter.py
Ignored
CodSpeed Performance Gauge
0%
1.1 ms1.1 ms

Commits

Click on a commit to change the comparison range
Base
main
21dc469
-0.59%
feat: Conserve Parquet `SortingColumns` for ints This PR makes it so that `SortedColumns` can be used to preserve the sorted flag when reading into Polars. Currently, this is only enabled for integers as other types might require additional considerations. Enabling this feature for other types is trivial now, however. ```rust import polars as pl import pyarrow.parquet as pq import io f = io.BytesIO() df = pl.DataFrame({ "a": [1, 2, 3, 4, 5, None], "b": [1.0, 2.0, 3.0, 4.0, 5.0, None], "c": range(6), }) pq.write_table( df.to_arrow(), f, sorting_columns=[ pq.SortingColumn(0, False, False), pq.SortingColumn(1, False, False), ], ) f.seek(0) df = pl.read_parquet(f)._to_metadata(stats='sorted_asc') ``` Before: ```console shape: (3, 2) ┌─────────────┬────────────┐ │ column_name ┆ sorted_asc │ │ --- ┆ --- │ │ str ┆ bool │ ╞═════════════╪════════════╡ │ a ┆ false │ │ b ┆ false │ │ c ┆ false │ └─────────────┴────────────┘ ``` After: ```console shape: (3, 2) ┌─────────────┬────────────┐ │ column_name ┆ sorted_asc │ │ --- ┆ --- │ │ str ┆ bool │ ╞═════════════╪════════════╡ │ a ┆ true │ │ b ┆ false │ │ c ┆ false │ └─────────────┴────────────┘ ```
3cd0c62
1 year ago
by coastalwhite
-0.2%
pyfmt
9696e5b
1 year ago
by coastalwhite
© 2025 CodSpeed Technology
Home Terms Privacy Docs