Skip to content

GH-50223: [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper#50224

Open
fangchenli wants to merge 2 commits into
apache:mainfrom
fangchenli:gh-string-view-grouper
Open

GH-50223: [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper#50224
fangchenli wants to merge 2 commits into
apache:mainfrom
fangchenli:gh-string-view-grouper

Conversation

@fangchenli

@fangchenli fangchenli commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

Grouper rejected string_view and binary_view group keys with NotImplemented: Keys of type string_view, so Table.group_by on a view-typed key failed.

What changes are included in this PR?

  • Add BinaryViewKeyEncoder in row_encoder_internal.{h,cc}. It encodes view keys using the existing variable-length row format and decodes them into a fresh view array of the original type.
  • GrouperImpl::Make dispatches view keys to the new encoder.
  • GrouperFastImpl::CanUse rejects view keys, so they fall back to GrouperImpl.

Are these changes tested?

Yes, unit tests added.

Are there any user-facing changes?

Yes. Table.group_by now accepts string_view and binary_view group keys

@github-actions

Copy link
Copy Markdown

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@fangchenli fangchenli changed the title [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper GH-50223: [C++][Compute] Support string_view/binary_view keys in the hash-aggregate Grouper Jun 18, 2026
@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #50223 has been automatically assigned in GitHub to PR creator.

@fangchenli fangchenli marked this pull request as ready for review June 22, 2026 06:18
@fangchenli fangchenli requested a review from pitrou as a code owner June 22, 2026 06:18
Copilot AI review requested due to automatic review settings June 22, 2026 06:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the C++ hash-aggregate Grouper to accept string_view / binary_view group keys (fixing Table.group_by failures on view-typed keys) by adding a dedicated key encoder and routing view types through the non-fast GrouperImpl path.

Changes:

  • Add BinaryViewKeyEncoder to encode/decode view keys using the existing variable-length row format.
  • Update RowEncoder and GrouperImpl to dispatch *_view key types to the new encoder, while ensuring GrouperFastImpl rejects view keys.
  • Add unit tests (row encoder + grouper) and update benchmarks to generate view-typed inputs appropriately.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
cpp/src/arrow/compute/row/row_encoder_internal.h Declares BinaryViewKeyEncoder for view key encoding/decoding.
cpp/src/arrow/compute/row/row_encoder_internal.cc Implements BinaryViewKeyEncoder and dispatches view types in RowEncoder::Init.
cpp/src/arrow/compute/row/row_encoder_internal_test.cc Adds round-trip tests for array/scalar view key encoding.
cpp/src/arrow/compute/row/grouper.cc Routes view keys to BinaryViewKeyEncoder; makes fast path reject view keys.
cpp/src/arrow/compute/row/grouper_test.cc Adds functional coverage for view keys (consume/lookup/uniques + randomized tests).
cpp/src/arrow/compute/row/grouper_benchmark.cc Generates view inputs via cast to benchmark the fallback path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants