Skip to content

Comments

fix: prevent duplicate alias collision with user-provided __datafusion_extracted names#20432

Open
adriangb wants to merge 6 commits intoapache:mainfrom
pydantic:fix-expr-bug
Open

fix: prevent duplicate alias collision with user-provided __datafusion_extracted names#20432
adriangb wants to merge 6 commits intoapache:mainfrom
pydantic:fix-expr-bug

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Feb 19, 2026

Summary

  • Fixes a bug where the optimizer's AliasGenerator could produce alias names that collide with__datafusion_extracted_N aliases, causing a "Schema contains duplicate unqualified field name" error
  • I don't expect users themselves to create these aliases, but if you run the optimizers twice (with different AliasGenerator instances) you'll hit this.
  • Adds AliasGenerator::update_min_id() to advance the counter past existing aliases
  • Scans each plan node's expressions during ExtractLeafExpressions traversal to seed the generator before any extraction occurs
  • Switches to controlling the traversal which also means the config-based short circuit more clearly skips the entire rule.

Closes #20430

Test plan

  • Unit test: test_user_provided_extracted_alias_no_collision in extract_leaf_expressions
  • SLT regression test in projection_pushdown.slt with explicit __datafusion_extracted_2 alias

🤖 Generated with Claude Code

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Feb 19, 2026
@adriangb adriangb requested a review from alamb February 19, 2026 14:20
adriangb added a commit to pydantic/datafusion that referenced this pull request Feb 20, 2026
…n_extracted names (apache#20432)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
adriangb added a commit to pydantic/datafusion that referenced this pull request Feb 20, 2026
…n_extracted names (apache#20432)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
&& let Some(id_str) = alias
.name
.strip_prefix(EXTRACTED_EXPR_PREFIX)
.and_then(|s| s.strip_prefix('_'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be .and_then(|id| id_str.parse().ok())

adriangb and others added 5 commits February 20, 2026 16:48
…n_extracted names (apache#20430)

When a user query contains an explicit alias using the reserved
`__datafusion_extracted` prefix, the optimizer's AliasGenerator could
generate the same alias name, causing a "Schema contains duplicate
unqualified field name" error.

Fix by scanning each plan node's expressions for pre-existing
`__datafusion_extracted_N` aliases during the TopDown traversal in
ExtractLeafExpressions, advancing the generator counter past them
before any extraction occurs.

Closes apache#20430

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Traverse into subqueries when extracting leaf expressions and when
advancing the alias generator past existing extracted aliases. Also
collapse nested if-let to satisfy clippy::collapsible_if.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
FROM t
WHERE COALESCE(get_field(s, 'f1'), get_field(s, 'f2')) = 1;
----
1
Copy link
Contributor

@getChan getChan Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to also add an EXPLAIN assertion here? It could guard the alias allocation behavior directly (e.g. user-provided __datafusion_extracted_2 remains stable while optimizer-generated aliases move to the next IDs), so future optimizer refactors don’t regress silently.

assisted by codex

Adds an EXPLAIN query to verify the user-provided __datafusion_extracted_2
alias is preserved while optimizer-generated aliases skip to _3 and _4,
guarding against silent regressions in alias allocation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimizer rule error for ExtractLeafExpressions

3 participants