vortex-data
vortex
Blog
Docs
Changelog
Blog
Docs
Changelog
Overview
Branches
Benchmarks
Runs
Performance History
Latest Results
Updated Variant array and the new VariantGet expression (#7877) ## Summary This PR includes two big changes as Variant moves closer to readiness. 1. Potentially holding the `shredded` child of a variant array in the canonical VariantArray 2. A `VariantGet` expression that can pull extract data out of variant arrays, either in a typed way or as a more opaque `Variant`. For reviewers, some relevant context might be: 1. The [VariantGet](https://github.com/vortex-data/rfcs/pull/58) RFC: this RFC takes some lessons I've learned working on this into account and reflects my updated view of this problem. 2. The original [Variant type](https://vortex-data.github.io/rfcs/rfc/0015.html) RFC I think the Parquet spec is also a pretty good read and a very heavy influence of this work - [`Shredding`](https://parquet.apache.org/docs/file-format/types/variantshredding/) and the [`Variant type`](https://parquet.apache.org/docs/file-format/types/variantencoding/). --------- Signed-off-by: "Adam Gutglick" <adam@spiraldb.com> Signed-off-by: Adam Gutglick <adam@spiraldb.com>
develop
11 minutes ago
Switch python runtime to CurrentThreadRuntime (#7896) We want to unify the language bindings to have the same behaviour when interacting with vortex. This pr brings python bindings in line with C and Java in using CurrentThreadRuntime by default Vortex uses shared runtime underneath python api. When no background threads are configured the python thread drives the work on the scan. This means multiple Python threads can make progress independently as long as each thread owns the reader it is consuming ```python from concurrent.futures import ThreadPoolExecutor import pyarrow.compute as pc import vortex as vx def sum_column(path: str, column: str) -> int | float: reader = vx.open(path).to_arrow([column], batch_size=64_000) total = 0 for batch in reader: value = pc.sum(batch.column(column)).as_py() if value is not None: total += value return total columns = ["tip_amount", "fare_amount", "total_amount"] with ThreadPoolExecutor(max_workers=len(columns)) as threads: totals = list(threads.map(lambda column: sum_column("example.vortex", column), columns)) ``` Alternatively users who want vortex to work in the background, independently of user level python threads, can configure worker count to desired value. ```python import vortex as vx import vortex.runtime as vxrt previous_workers = vxrt.worker_count() vxrt.set_worker_threads_to_available_parallelism() try: reader = vx.open("example.vortex").to_arrow(batch_size=64_000) table = reader.read_all() finally: vxrt.set_worker_threads(previous_workers) ``` These examples are added to the docs --------- Signed-off-by: Robert Kruszewski <github@robertk.io>
develop
21 minutes ago
bug fix Signed-off-by: Adam Gutglick <adam@spiraldb.com>
adamg/yet-another-variant-array
30 minutes ago
threads Signed-off-by: Robert Kruszewski <github@robertk.io>
rk/pythonchanges
30 minutes ago
Add stats rewrite session API (#7930) Part of #7707. Base: `develop` ## Summary - Add stats rewrite session state under `vortex-array/src/stats/session.rs`. - Add the first crate-private `StatsRewriteRule` trait, rewrite context, and session registry. - Use `Expression::{falsify,satisfy}` as the only public entrypoints for stats-backed proof rewrites. - Register `StatsRewriteSession` in the default Vortex session. - Document the one-public-entrypoint rule in `STYLE.md`. ## Checks - `cargo test -p vortex-array stats::rewrite` - `./scripts/public-api.sh` - `cargo clippy --all-targets --all-features` Signed-off-by: Nicholas Gates <nick@nickgates.com>
develop
40 minutes ago
Use aggregate functions for stats dtype Signed-off-by: Adam Gutglick <adam@spiraldb.com>
adamg/use-aggregate-for-stat-type
43 minutes ago
Add a nicer progress bar and file-based filter (#7942) ## Summary Adds a nicer progress bar (color, and stays after file is finished so its easier to see what actually ran), and more importantly the ability to filter tests by file name! Also changed `setup.slt` to `setup.slt.no` because its not actually a test. <img width="1018" height="600" alt="image" src="https://github.com/user-attachments/assets/e08ca5b8-b221-4d96-940f-fa11d877288d" /> --------- Signed-off-by: Adam Gutglick <adam@spiraldb.com>
develop
49 minutes ago
u Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>
ji/fmt-fix
50 minutes ago
Latest Branches
CodSpeed Performance Gauge
+17%
Updated Variant array and the new VariantGet expression
#7877
31 minutes ago
9b8a508
adamg/yet-another-variant-array
CodSpeed Performance Gauge
0%
Switch python runtime to CurrentThreadRuntime
#7896
32 minutes ago
fa7640f
rk/pythonchanges
CodSpeed Performance Gauge
-17%
Use aggregate functions for stats dtype
#7944
45 minutes ago
95d691d
adamg/use-aggregate-for-stat-type
Ā© 2026 CodSpeed Technology
Home
Terms
Privacy
Docs