-
Notifications
You must be signed in to change notification settings - Fork 394
fix/3190 Fixed the persistence issue of boundary timestamp after removing it #3321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
docs | 7892550 | Commit Preview URL Branch Preview URL |
Nov 19 2025, 09:00 AM |
29292e6 to
7e708f3
Compare
3e6dc17 to
760fe85
Compare
rudolfix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls check the review comments. schema hints are additive: schema fragments may come from many places: import schema, explicit code, resource hints, apply hints etc. the fact that hint is not present is not enough to remove it from the schema.
That's why user needs to pass None here. If it does not work we have a real problem though...
dlt/extract/hints.py
Outdated
| if merge_strategy == "scd2": | ||
| md_dict = cast(TScd2StrategyDict, md_dict) | ||
| if "boundary_timestamp" in md_dict: | ||
| boundary = md_dict.get("boundary_timestamp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the root problem here IMO is that user is not correctly resetting the boundary value (or it does not work). we already do what you implemented below when we generate SQL:
boundary_ts = ensure_pendulum_datetime_utc(
root_table.get("x-boundary-timestamp", current_load_package()["state"]["created_at"]) # type: ignore[arg-type]
)
and IMO the problem was user was skipping this hint on a next run instead of setting boundary to None. Let's do an experiment first. see below
tests/load/pipeline/test_scd2.py
Outdated
| } | ||
| ) | ||
| with mock.patch( | ||
| "dlt.common.storages.load_package.precise_time", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do an experiment here. please stash changes in hints.py so we work with the original code and modify the test to pass "boundary_timestamp": None
this test should pass (we drop the hint, code in sql_jobs.py sets the package timestamp). if the test passes we just need to amend our docs in scd2 on how to reset the boundary.
…p using the current load package's creation time when none is provided. Modified corresponding test to ensure correct behavior in the pipeline.
8706139 to
0adf780
Compare
…ests for new behavior, including resetting to current load time.
0adf780 to
7892550
Compare
rudolfix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could have a little cleaner code. docs and tests are LGTM - thanks for that!
please take the sqlglot patch commit so common tests are passing. we can't merge without them
| f"could not parse `{ts}` value `{wd[ts]}`" # type: ignore[literal-required] | ||
| ) | ||
|
|
||
| art = wd.get("active_record_timestamp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previous loop was IMO more elegant. too much duplicated code here IMO
| boundary_ts = ensure_pendulum_datetime_utc( | ||
| root_table.get("x-boundary-timestamp", current_load_package()["state"]["created_at"]) # type: ignore[arg-type] | ||
| ) | ||
| created_at = current_load_package()["state"]["created_at"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code is correct. but you do not need to take "created_at" from the package unconditionally. do it in ternary operator below else current_load_package()["state"]["created_at"]
same thing in sql_jobs
Description
Related Issues
Fixes #3190