You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/website/docs/general-usage/merge-loading.md
+29Lines changed: 29 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -567,6 +567,35 @@ def dim_customer():
567
567
...
568
568
```
569
569
570
+
#### Reset boundary timestamp to the current load time
571
+
To stop using a previously set `boundary_timestamp` and revert to the default (the current load package creation time), set `boundary_timestamp` to `None`. You can do this either at definition time or dynamically with `apply_hints` before a run.
572
+
573
+
Definition-time (always use current load time):
574
+
```py
575
+
@dlt.resource(
576
+
write_disposition={
577
+
"disposition": "merge",
578
+
"strategy": "scd2",
579
+
"boundary_timestamp": None, # reset to current load time
580
+
}
581
+
)
582
+
defdim_customer():
583
+
...
584
+
```
585
+
586
+
Per-run reset (override just for this run):
587
+
```py
588
+
r.apply_hints(
589
+
write_disposition={
590
+
"disposition": "merge",
591
+
"strategy": "scd2",
592
+
"boundary_timestamp": None, # reset to current load time for this run
593
+
}
594
+
)
595
+
pipeline.run(r(...))
596
+
```
597
+
When `boundary_timestamp` is `None` (or omitted), `dlt` uses the load package's creation timestamp as the boundary for both retiring existing versions and creating new versions.
598
+
570
599
### Example: Use your own row hash
571
600
By default, `dlt` generates a row hash based on all columns provided by the resource and stores it in `_dlt_id`. You can use your own hash instead by specifying `row_version_column_name` in the `write_disposition` dictionary. You might already have a column present in your resource that can naturally serve as a row hash, in which case it's more efficient to use those pre-existing hash values than to generate new artificial ones. This option also allows you to use hashes based on a subset of columns, in case you want to ignore changes in some of the columns. When using your own hash, values for `_dlt_id` are randomly generated.
0 commit comments