Commits
Click on a commit to change the comparison rangefeat: Conserve Parquet `SortingColumns` for ints
This PR makes it so that `SortedColumns` can be used to preserve the sorted
flag when reading into Polars. Currently, this is only enabled for integers as
other types might require additional considerations. Enabling this feature for
other types is trivial now, however.
```rust
import polars as pl
import pyarrow.parquet as pq
import io
f = io.BytesIO()
df = pl.DataFrame({
"a": [1, 2, 3, 4, 5, None],
"b": [1.0, 2.0, 3.0, 4.0, 5.0, None],
"c": range(6),
})
pq.write_table(
df.to_arrow(),
f,
sorting_columns=[
pq.SortingColumn(0, False, False),
pq.SortingColumn(1, False, False),
],
)
f.seek(0)
df = pl.read_parquet(f)._to_metadata(stats='sorted_asc')
```
Before:
```console
shape: (3, 2)
┌─────────────┬────────────┐
│ column_name ┆ sorted_asc │
│ --- ┆ --- │
│ str ┆ bool │
╞═════════════╪════════════╡
│ a ┆ false │
│ b ┆ false │
│ c ┆ false │
└─────────────┴────────────┘
```
After:
```console
shape: (3, 2)
┌─────────────┬────────────┐
│ column_name ┆ sorted_asc │
│ --- ┆ --- │
│ str ┆ bool │
╞═════════════╪════════════╡
│ a ┆ true │
│ b ┆ false │
│ c ┆ false │
└─────────────┴────────────┘
```5 months ago
by coastalwhite pyfmt5 months ago
by coastalwhite