fix: prevent duplicate alias collision with user-provided __datafusion_extracted names#20432
Open
adriangb wants to merge 6 commits intoapache:mainfrom
Open
fix: prevent duplicate alias collision with user-provided __datafusion_extracted names#20432adriangb wants to merge 6 commits intoapache:mainfrom
adriangb wants to merge 6 commits intoapache:mainfrom
Conversation
adriangb
added a commit
to pydantic/datafusion
that referenced
this pull request
Feb 20, 2026
…n_extracted names (apache#20432) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
adriangb
added a commit
to pydantic/datafusion
that referenced
this pull request
Feb 20, 2026
…n_extracted names (apache#20432) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
cetra3
reviewed
Feb 20, 2026
| && let Some(id_str) = alias | ||
| .name | ||
| .strip_prefix(EXTRACTED_EXPR_PREFIX) | ||
| .and_then(|s| s.strip_prefix('_')) |
Contributor
There was a problem hiding this comment.
Couldn't this be .and_then(|id| id_str.parse().ok())
…n_extracted names (apache#20430) When a user query contains an explicit alias using the reserved `__datafusion_extracted` prefix, the optimizer's AliasGenerator could generate the same alias name, causing a "Schema contains duplicate unqualified field name" error. Fix by scanning each plan node's expressions for pre-existing `__datafusion_extracted_N` aliases during the TopDown traversal in ExtractLeafExpressions, advancing the generator counter past them before any extraction occurs. Closes apache#20430 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Traverse into subqueries when extracting leaf expressions and when advancing the alias generator past existing extracted aliases. Also collapse nested if-let to satisfy clippy::collapsible_if. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2b26a30 to
c6774b6
Compare
getChan
approved these changes
Feb 20, 2026
| FROM t | ||
| WHERE COALESCE(get_field(s, 'f1'), get_field(s, 'f2')) = 1; | ||
| ---- | ||
| 1 |
Contributor
There was a problem hiding this comment.
Would it make sense to also add an EXPLAIN assertion here? It could guard the alias allocation behavior directly (e.g. user-provided __datafusion_extracted_2 remains stable while optimizer-generated aliases move to the next IDs), so future optimizer refactors don’t regress silently.
assisted by codex
Adds an EXPLAIN query to verify the user-provided __datafusion_extracted_2 alias is preserved while optimizer-generated aliases skip to _3 and _4, guarding against silent regressions in alias allocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AliasGeneratorcould produce alias names that collide with__datafusion_extracted_Naliases, causing a "Schema contains duplicate unqualified field name" errorAliasGeneratorinstances) you'll hit this.AliasGenerator::update_min_id()to advance the counter past existing aliasesExtractLeafExpressionstraversal to seed the generator before any extraction occursCloses #20430
Test plan
test_user_provided_extracted_alias_no_collisioninextract_leaf_expressionsprojection_pushdown.sltwith explicit__datafusion_extracted_2alias🤖 Generated with Claude Code