Avatar for the vortex-data user
vortex-data
vortex
BlogDocsChangelog

Performance History

Latest Results

Updated Variant array and the new VariantGet expression (#7877) ## Summary This PR includes two big changes as Variant moves closer to readiness. 1. Potentially holding the `shredded` child of a variant array in the canonical VariantArray 2. A `VariantGet` expression that can pull extract data out of variant arrays, either in a typed way or as a more opaque `Variant`. For reviewers, some relevant context might be: 1. The [VariantGet](https://github.com/vortex-data/rfcs/pull/58) RFC: this RFC takes some lessons I've learned working on this into account and reflects my updated view of this problem. 2. The original [Variant type](https://vortex-data.github.io/rfcs/rfc/0015.html) RFC I think the Parquet spec is also a pretty good read and a very heavy influence of this work - [`Shredding`](https://parquet.apache.org/docs/file-format/types/variantshredding/) and the [`Variant type`](https://parquet.apache.org/docs/file-format/types/variantencoding/). --------- Signed-off-by: "Adam Gutglick" <adam@spiraldb.com> Signed-off-by: Adam Gutglick <adam@spiraldb.com>
develop
11 minutes ago
Switch python runtime to CurrentThreadRuntime (#7896) We want to unify the language bindings to have the same behaviour when interacting with vortex. This pr brings python bindings in line with C and Java in using CurrentThreadRuntime by default Vortex uses shared runtime underneath python api. When no background threads are configured the python thread drives the work on the scan. This means multiple Python threads can make progress independently as long as each thread owns the reader it is consuming ```python from concurrent.futures import ThreadPoolExecutor import pyarrow.compute as pc import vortex as vx def sum_column(path: str, column: str) -> int | float: reader = vx.open(path).to_arrow([column], batch_size=64_000) total = 0 for batch in reader: value = pc.sum(batch.column(column)).as_py() if value is not None: total += value return total columns = ["tip_amount", "fare_amount", "total_amount"] with ThreadPoolExecutor(max_workers=len(columns)) as threads: totals = list(threads.map(lambda column: sum_column("example.vortex", column), columns)) ``` Alternatively users who want vortex to work in the background, independently of user level python threads, can configure worker count to desired value. ```python import vortex as vx import vortex.runtime as vxrt previous_workers = vxrt.worker_count() vxrt.set_worker_threads_to_available_parallelism() try: reader = vx.open("example.vortex").to_arrow(batch_size=64_000) table = reader.read_all() finally: vxrt.set_worker_threads(previous_workers) ``` These examples are added to the docs --------- Signed-off-by: Robert Kruszewski <github@robertk.io>
develop
21 minutes ago
bug fix Signed-off-by: Adam Gutglick <adam@spiraldb.com>
adamg/yet-another-variant-array
30 minutes ago

Latest Branches

CodSpeed Performance Gauge
+17%
Updated Variant array and the new VariantGet expression#7877
31 minutes ago
9b8a508
adamg/yet-another-variant-array
CodSpeed Performance Gauge
0%
32 minutes ago
fa7640f
rk/pythonchanges
CodSpeed Performance Gauge
-17%
Use aggregate functions for stats dtype#7944
45 minutes ago
95d691d
adamg/use-aggregate-for-stat-type
Ā© 2026 CodSpeed Technology
Home Terms Privacy Docs