Proposal: Introduce Vortex Columnar Format Support in GraphAr #887

SYaoJun · 2026-02-28T04:25:12Z

SYaoJun
Feb 28, 2026

Background

Currently, several emerging columnar file formats—such as Vortex, Lance, F3, BtrBlocks, Nimble, and Parquet variants—demonstrate strong performance advantages in specific scenarios.

I wonder whether supporting these formats in GraphAr could significantly reduce storage overhead and improve query performance at scale.

Benefits

Introducing the Vortex columnar format can improve storage efficiency and query performance through better compression and vectorized execution.
It enables more flexible column-level encoding strategies, which can better align with analytical graph workloads.
Vortex is designed to be GPU-friendly, particularly in AI and analytics scenarios.

Effects of Modifications

Storage layer implementation and format adapters
All binding languages require adoption.

enum class FileType : int32_t { CSV = 0, PARQUET = 1, ORC = 2, JSON = 3 };

Evidence from DuckDB

Vortex has already been integrated into DuckDB, where it demonstrates substantial performance improvements on analytical workloads such as TPC-H. Reported results show significant gains in scan efficiency and query execution time compared to traditional columnar formats. detail in this blog.

What do others think about this idea?

SemyonSinchenko · 2026-02-28T20:01:30Z

SemyonSinchenko
Feb 28, 2026
Collaborator

Hi @SYaoJun !

My only concern is it may become very hard for clients to support the standard if we have too many of different underlying formats. Are we considering something like "extension" mechanics? As I remember, a lot of formats supported in tools like DuckDB or Apache Spark through the extensions API, instead of adding it to the core.

I mean, for example, the official Java SDK of the Vortex, the vortex-jni is 136.9 Mb JAR file... Having 5-7 such a connectors will lead to the ~1Gb of dependencies only.

What do you think about any of:

extensions API
GraphAr "contrib"
Maybe something else?

?

0 replies

SYaoJun · 2026-03-02T16:00:15Z

SYaoJun
Mar 2, 2026
Author

Hi @SemyonSinchenko, Thank you so much for your insightful feedback. Your perspectives are truly thought-provoking. Actually, I’m a bit confused about your comment: "hard for clients to support the standard if we have too many of different underlying formats". Let me clarify my understanding: GAR provides high-level APIs (e.g., for vertices and edges) that are completely transparent to users. Additionally, we already support CSV, JSON, ORC, and Parquet formats. For the new Vortex format, we could follow the ORC implementation approach and use macros to separate the original code logic. You suggested adopting an "extension" mechanism, which I fully agree is a sound and appropriate approach. However, our current codebase does not yet support extension plugins, and I’m also unsure about how to implement this mechanism effectively. To move forward, I used AI to generate a minimal viable product (MVP) of Vortex under the cpp/ directory: SYaoJun@78a32ba. Would you mind taking a look at it? This implementation aligns nearly perfectly with my initial vision for Vortex. Thank you again for your guidance! Best regards, Yaojun At 2026-03-01 04:01:52, "Sem" ***@***.***> wrote: Hi @SYaoJun ! My only concern is it may become very hard for clients to support the standard if we have too many of different underlying formats. Are we considering something like "extension" mechanics? As I remember, a lot of formats supported in tools like DuckDB or Apache Spark through the extensions API, instead of adding it to the core. I mean, for example, the official Java SDK of the Vortex, the vortex-jni is 136.9 Mb JAR file... Having 5-7 such a connectors will lead to the ~1Gb of dependencies only. What do you think about any of: extensions API GraphAr "contrib" Maybe something else? ? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

0 replies

SemyonSinchenko · 2026-03-02T16:05:47Z

SemyonSinchenko
Mar 2, 2026
Collaborator

Let me clarify my understanding: GAR provides high-level APIs (e.g., for vertices and edges) that are completely transparent to users.

@SYaoJun To be honest, I see it slightly differently. To me, GAR is a standard and set of specifications for storing data, and reference implementations are secondary. What is inside the GAR repository is just one reference implementation. For example, Apache Parquet is a standard, and there are multiple implementations besides the reference one from Apache. If we add too many supported formats as part of the standard, I'm afraid it will become very hard for anyone to support the entire GAR standard.

1 reply

SYaoJun Mar 2, 2026
Author

@SemyonSinchenko Thanks for clarifying—I understand your concern. I’m not yet fully familiar with the practical usage of GAR, so I’ll keep an eye on Vortex. If it proves viable, we can proceed to the next step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Introduce Vortex Columnar Format Support in GraphAr #887

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Proposal: Introduce Vortex Columnar Format Support in GraphAr #887

Uh oh!

Uh oh!

SYaoJun Feb 28, 2026

Background

Benefits

Effects of Modifications

Evidence from DuckDB

Replies: 3 comments · 1 reply

Uh oh!

Uh oh!

SemyonSinchenko Feb 28, 2026 Collaborator

Uh oh!

SYaoJun Mar 2, 2026 Author

Uh oh!

SemyonSinchenko Mar 2, 2026 Collaborator

Uh oh!

SYaoJun Mar 2, 2026 Author

SYaoJun
Feb 28, 2026

Replies: 3 comments 1 reply

SemyonSinchenko
Feb 28, 2026
Collaborator

SYaoJun
Mar 2, 2026
Author

SemyonSinchenko
Mar 2, 2026
Collaborator

SYaoJun Mar 2, 2026
Author