Replies: 3 comments 1 reply
-
|
Hi @SYaoJun ! My only concern is it may become very hard for clients to support the standard if we have too many of different underlying formats. Are we considering something like "extension" mechanics? As I remember, a lot of formats supported in tools like DuckDB or Apache Spark through the extensions API, instead of adding it to the core. I mean, for example, the official Java SDK of the Vortex, the What do you think about any of:
? |
Beta Was this translation helpful? Give feedback.
-
|
Hi @SemyonSinchenko,
Thank you so much for your insightful feedback. Your perspectives are truly thought-provoking. Actually, I’m a bit confused about your comment: "hard for clients to support the standard if we have too many of different underlying formats". Let me clarify my understanding: GAR provides high-level APIs (e.g., for vertices and edges) that are completely transparent to users. Additionally, we already support CSV, JSON, ORC, and Parquet formats. For the new Vortex format, we could follow the ORC implementation approach and use macros to separate the original code logic.
You suggested adopting an "extension" mechanism, which I fully agree is a sound and appropriate approach. However, our current codebase does not yet support extension plugins, and I’m also unsure about how to implement this mechanism effectively.
To move forward, I used AI to generate a minimal viable product (MVP) of Vortex under the cpp/ directory: SYaoJun@78a32ba. Would you mind taking a look at it? This implementation aligns nearly perfectly with my initial vision for Vortex.
Thank you again for your guidance!
Best regards,
Yaojun
At 2026-03-01 04:01:52, "Sem" ***@***.***> wrote:
Hi @SYaoJun !
My only concern is it may become very hard for clients to support the standard if we have too many of different underlying formats. Are we considering something like "extension" mechanics? As I remember, a lot of formats supported in tools like DuckDB or Apache Spark through the extensions API, instead of adding it to the core.
I mean, for example, the official Java SDK of the Vortex, the vortex-jni is 136.9 Mb JAR file... Having 5-7 such a connectors will lead to the ~1Gb of dependencies only.
What do you think about any of:
extensions API
GraphAr "contrib"
Maybe something else?
?
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
@SYaoJun To be honest, I see it slightly differently. To me, GAR is a standard and set of specifications for storing data, and reference implementations are secondary. What is inside the GAR repository is just one reference implementation. For example, Apache Parquet is a standard, and there are multiple implementations besides the reference one from Apache. If we add too many supported formats as part of the standard, I'm afraid it will become very hard for anyone to support the entire GAR standard. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
Currently, several emerging columnar file formats—such as Vortex, Lance, F3, BtrBlocks, Nimble, and Parquet variants—demonstrate strong performance advantages in specific scenarios.
I wonder whether supporting these formats in GraphAr could significantly reduce storage overhead and improve query performance at scale.
Benefits
Effects of Modifications
Evidence from DuckDB
Vortex has already been integrated into DuckDB, where it demonstrates substantial performance improvements on analytical workloads such as TPC-H. Reported results show significant gains in scan efficiency and query execution time compared to traditional columnar formats. detail in this blog.
What do others think about this idea?
Beta Was this translation helpful? Give feedback.
All reactions