Skip to content

[WIP] feat(c++): support reading certain set of properties #697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

yangxk1
Copy link
Contributor

@yangxk1 yangxk1 commented May 29, 2025

Reason for this PR

This PR addresses the feature request in #397 to support reading certain set of properties.

What changes are included in this PR?

I have modified VertexPropertyArrowChunkReader::Make so that when a single property_name is provided, it no longer loads all properties in the corresponding propertyGroup, but only reads the internal ID and the specified property column.
Additionally, a std::vector<std::string> of property_names can be passed to read a specific set of properties, provided they all belong to the same propertyGroup.

Essentially, these property_names are added to FilterOptions.columns, serving as conditions for predicate pushdown.

Are these changes tested?

Yes, I have added:

  • Examples
  • Unit tests
  • Benchmarks

The benchmark results show performance improvements, especially on large chunk sizes.

  • ReadChunkSelectAllColumnsIn*: read using a property group (3 properties + internal ID)

  • ReadChunkSelectOneColumnIn*: read only the internal ID and a single property

  • ReadChunkSelectTwoColumnIn*: read only the internal ID and 2 properties

  • *FirstGraph: read person vertex in ldbc_sample (chunk_size: 100)

  • *SecondGraph: read organisation vertex in ldbc (chunk_size: 4096)

image

Are there any user-facing changes?

Yes, there are breaking changes:

Previously, users could pass a property_name directly to access data from its propertyGroup.
Now, to read propertyGroup, users should explicitly provide the corresponding propertyGroup.
This change improves clarity and ensures consistency when filtering properties.

@yangxk1
Copy link
Contributor Author

yangxk1 commented May 29, 2025

In addition, I’ve written a demo showing how to use the reader to read specific properties. The implementation can be found at: https://github.com/yangxk1/incubator-graphar/tree/test-select-columns-by-reader

The demo demonstrates that the reader offers significant performance gains compared to the current scanner-based approach.
However, the reader does not support predicate pushdown.

*V2 use the reader
image

@yangxk1
Copy link
Contributor Author

yangxk1 commented May 29, 2025

Note that scanner is an Arrow-level (in-memory) operation, while reader is a Parquet-level (file-level) operation.

@yangxk1 yangxk1 changed the title [WIP] support reading certain set of properties [WIP] feat(c++): support reading certain set of properties May 29, 2025
@yangxk1
Copy link
Contributor Author

yangxk1 commented Jun 4, 2025

looking forward to your suggestions. @lixueclaire @acezen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant