-
Notifications
You must be signed in to change notification settings - Fork 394
Fix: primary_key not removed with apply_hints(primary_key="") #3361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | cb94884 | Commit Preview URL Branch Preview URL |
Nov 21 2025, 12:43 PM |
b5a7d10 to
6c3f35a
Compare
6c3f35a to
f40733b
Compare
f40733b to
cb94884
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can handle everything in diff_table if we mark certain hints as compound and implement missing behaviors. Here's how hints are evolved:
- we have resource definition (columns, primary keys), apply hints, and hints manipulation directly in code. this always happens from scratch in each pipeline run. user can do here whatever user wants. no changes are needed
- extract starts: here we must deal with schema from previous run or with imported schema
- this happens just before extract:
source.schema.update_schema(source_schema): we use newly computed source schema to update pipeline schema (table by table) - tables diffs are calculated: during that process a diff is created which is a partial table with new and modified columns
- the diff is applied to table in pipeline schema and columns are merged with merge_columns.
- table updates happen a few times more (dynamic tables in the extract, schema evolution in the normalize) but the logic is the same
the root of the problem is in diff_table: it just blindly merges column hints additively. we must handle compound hints here. we assume that partial table has complete information on each hint. so for example:
- if there a column that sets a primary key, we must reset those keys on all columns of pipeline schema table (table_a) and add them to diff.
- if there a column without primary key but in the table_a the key is set we know that primary key hint is modified and we need to remove it from table_a fully
tldr ;> a block replaces existing block, even if new block is empty (but intersection of column of old blocks with columns in the diff cannot be empty)
a few notes:
- preserve the column order (merge_columns does it, it will be sufficient)
merge_columnswon't needcolumns_partial- this should be a flag when generating diff (see docstrings)- we are implementing
ifcolumns_partialis False, hints likeprimary_keyandmerge_keyare dropped fromcolumns_aand replaced fromcolumns_b`` merge_columnmerge default is always True. you can remove that flag and remove all "false" usage (let's see what happens)
this is not that complicated overall. and the hints merge code is in better shape than I remembered...
|
|
||
|
|
||
| TColumnPropMergeType = Literal[ | ||
| "replacable", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just mark certain hints as compound
This PR attempts to solve the issue where setting
primary_keyto an empty string inapply_hintsdoes not actually remove it from schema.First attempted to be resolved in #3280.
Resolves #3210