Skip to content

GH-39603: [R] Error: Cannot convert Dictionary Array of type dictionary<values=large_string, indices=uint32, ordered=0> to R#49710

Merged
thisisnic merged 7 commits into
apache:mainfrom
thisisnic:gh-39603_dictionary-array
Jun 10, 2026
Merged

GH-39603: [R] Error: Cannot convert Dictionary Array of type dictionary<values=large_string, indices=uint32, ordered=0> to R#49710
thisisnic merged 7 commits into
apache:mainfrom
thisisnic:gh-39603_dictionary-array

Conversation

@thisisnic

@thisisnic thisisnic commented Apr 11, 2026

Copy link
Copy Markdown
Member

Rationale for this change

Dictionary Arrays with specific type indices caused error

What changes are included in this PR?

Allows those type indices and add check to ensure no overflow

Are these changes tested?

Yes

Are there any user-facing changes?

No

AI Usage

I used AI to create the changes but manually reviewed myself afterwards

@github-actions

Copy link
Copy Markdown

⚠️ GitHub issue #39603 has been automatically assigned in GitHub to PR creator.

@thisisnic thisisnic marked this pull request as ready for review April 30, 2026 14:29
@thisisnic thisisnic requested a review from jonkeane as a code owner April 30, 2026 14:29
@thisisnic thisisnic marked this pull request as draft April 30, 2026 14:34
@thisisnic thisisnic marked this pull request as ready for review May 28, 2026 09:53
Copilot AI review requested due to automatic review settings May 28, 2026 09:53

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes #39603 by extending the non-ALTREP Converter_Dictionary path in the R bindings to support UINT32, INT64, and UINT64 dictionary index types, in addition to adding a safety check for dictionaries larger than R's 32-bit factor code range.

Changes:

  • Allow UINT32/INT64/UINT64 index types in Converter_Dictionary's constructor and Ingest_some_nulls dispatch.
  • Add a runtime check that errors when the dictionary length exceeds std::numeric_limits<int>::max() (R factor code limit).
  • Add a new test that round-trips a factor through Table$create() with uint32/int64/uint64 dictionary index types.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
r/src/array_to_vector.cpp Adds UINT32/INT64/UINT64 cases to dictionary index handling and adds a dictionary-length overflow guard.
r/tests/testthat/test-Table.R Adds a test round-tripping a factor with wider dictionary index types back to R.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread r/tests/testthat/test-Table.R Outdated
Comment on lines +377 to +387
tab_uint32 <- Table$create(fact, schema = schema(fct = dictionary(uint32(), utf8())))
expect_equal(tab_uint32$schema, schema(fct = dictionary(uint32(), utf8())))
expect_equal_data_frame(tab_uint32, fact)

tab_int64 <- Table$create(fact, schema = schema(fct = dictionary(int64(), utf8())))
expect_equal(tab_int64$schema, schema(fct = dictionary(int64(), utf8())))
expect_equal_data_frame(tab_int64, fact)

tab_uint64 <- Table$create(fact, schema = schema(fct = dictionary(uint64(), utf8())))
expect_equal(tab_uint64$schema, schema(fct = dictionary(uint64(), utf8())))
expect_equal_data_frame(tab_uint64, fact)

@jonkeane jonkeane left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor suggestion for test name, but if CI is happy I'm happy!

Comment thread r/tests/testthat/test-Table.R Outdated
}
})

test_that("Table converts dictionary arrays with wider index types back to R", {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with wider index types

I finally got what this was saying, but it took a few clicks. Maybe we say something about uint64 / uint32 here?

@github-actions github-actions Bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Jun 3, 2026
Comment thread r/tests/testthat/test-Table.R Outdated
Copilot AI review requested due to automatic review settings June 8, 2026 10:48
@github-actions github-actions Bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Jun 8, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

Comment thread r/tests/testthat/test-Table.R Outdated
Comment on lines +374 to +378
test_that("Table converts dictionary arrays with uint32/int64/uint64 index types back to R", {
fact <- example_data[, "fct"]

tab_uint32 <- Table$create(fact, schema = schema(fct = dictionary(uint32(), utf8())))
expect_equal(tab_uint32$schema, schema(fct = dictionary(uint32(), utf8())))
@github-actions github-actions Bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 8, 2026
Copilot AI review requested due to automatic review settings June 10, 2026 11:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

@thisisnic thisisnic merged commit b3ba226 into apache:main Jun 10, 2026
54 checks passed
@thisisnic thisisnic removed the awaiting change review Awaiting change review label Jun 10, 2026
@conbench-apache-arrow

Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit b3ba226.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 43 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants