Vortex: An extensible, state of the art columnar file format

https://news.ycombinator.com/rss Hits: 14
Summary

๐ŸŒช๏ธ Vortex ๐Ÿซถ Join the community on Slack! | ๐Ÿ“š Documentation | ๐Ÿ“Š Performance Benchmarks Overview Vortex is a next-generation columnar file format and toolkit designed for high-performance data processing. It is the fastest and most extensible format for building data systems backed by object storage. It provides: โšก๏ธ Blazing Fast Performance 100x faster random access reads (vs. modern Apache Parquet) 10-20x faster scans 5x faster writes Similar compression ratios Efficient support for wide tables with zero-copy/zero-parse metadata ๐Ÿ”ง Extensible Architecture Modeled after Apache DataFusion's extensible approach Pluggable encoding system, type system, compression strategy, & layout strategy Zero-copy compatibility with Apache Arrow ๐Ÿ—ณ๏ธ Open Source, Neutral Governance A Linux Foundation (LF AI & Data) Project Apache-2.0 Licensed โ†”๏ธ Integrations Arrow, DataFusion, DuckDB, Spark, Pandas, Polars, & more Apache Iceberg (coming soon) ๐ŸŸข Development Status: Library APIs may change from version to version, but we now consider the file format stable . From release 0.36.0, all future releases of Vortex should maintain backwards compatibility of the file format (i.e., be able to read files written by any earlier version >= 0.36.0). Key Features Core Capabilities โœจ Logical Types - Clean separation between logical schema and physical layout - Clean separation between logical schema and physical layout ๐Ÿ”„ Zero-Copy Arrow Integration - Seamless conversion to/from Apache Arrow arrays - Seamless conversion to/from Apache Arrow arrays ๐Ÿงฉ Extensible Encodings - Pluggable physical layouts with built-in optimizations - Pluggable physical layouts with built-in optimizations ๐Ÿ“ฆ Cascading Compression - Support for nested encoding schemes - Support for nested encoding schemes ๐Ÿš€ High-Performance Computing - Optimized compute kernels for encoded data - Optimized compute kernels for encoded data ๐Ÿ“Š Rich Statistics - Lazy-loaded summary statistics for optimization Technical Architecture Logical vs Physical...

First seen: 2025-11-19 23:01

Last seen: 2025-11-20 12:04