You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What additional value does this feature bring to the project?
It enables delta/partial loads for effectivity satellites - at least under certain, but frequent conditions.
Is your feature request related to a problem? Please describe.
Currently, there is the big restriction on effectivity satellites that they can only be used when the stage model contains all hashkeys that should remain active.
However, when using an effectivity satellite i.e. on a Link Table, this means that all sources used for the link must provide a full load at any time. For large sources this can be prohibitive.
Performing only a delta load of changed records will not work: leaving unchanged source records out of the stage table will mark the link entries derived from those as "inactive".
Describe the solution you'd like
I'd like to define a "driving (hash-)key" in addition to the tracked_hashkey that relaxes the "full load" requirement somewhat:
When defining a driving hashkey, only those tracked hashkeys would be eligible for being marked as inactive whose driving hashkey actually appears in the source model / stage table.
A typical case would be a raw data vault with two source tables "A" and "B", with "A" holding a foreign key to "B".
In data vault this would result in a link LNK_A_B with hashkey hk_A_B_l between two entities hk_A_h and hk_B_h.
By designating hk_A_h as the "driving hashkey", this solution would allow for partial loads of source table "A" while still properly tracking which references between A and B are active at any given time.
Note that this may also require designating a "reference model" for the partial load that can be used to lookup the driving hashkey (hk_A_h) for each tracked hashkey ( hk_A_B_l) - in the example that would be the link table LNK_A_B itself.
Describe alternatives you've considered
The alternatives I can see are:
Doing a full load of the source table (i.e. A), regardless of the amount of data that needs to be processed
Not using the existing effectivity satellite macro but writing a custom one
Additional context
The text was updated successfully, but these errors were encountered:
What additional value does this feature bring to the project?
It enables delta/partial loads for effectivity satellites - at least under certain, but frequent conditions.
Is your feature request related to a problem? Please describe.
Currently, there is the big restriction on effectivity satellites that they can only be used when the stage model contains all hashkeys that should remain active.
However, when using an effectivity satellite i.e. on a Link Table, this means that all sources used for the link must provide a full load at any time. For large sources this can be prohibitive.
Performing only a delta load of changed records will not work: leaving unchanged source records out of the stage table will mark the link entries derived from those as "inactive".
Describe the solution you'd like
I'd like to define a "driving (hash-)key" in addition to the
tracked_hashkey
that relaxes the "full load" requirement somewhat:When defining a driving hashkey, only those tracked hashkeys would be eligible for being marked as inactive whose driving hashkey actually appears in the source model / stage table.
A typical case would be a raw data vault with two source tables "A" and "B", with "A" holding a foreign key to "B".
In data vault this would result in a link
LNK_A_B
with hashkeyhk_A_B_l
between two entitieshk_A_h
andhk_B_h
.By designating
hk_A_h
as the "driving hashkey", this solution would allow for partial loads of source table "A" while still properly tracking which references between A and B are active at any given time.Note that this may also require designating a "reference model" for the partial load that can be used to lookup the driving hashkey (
hk_A_h
) for each tracked hashkey (hk_A_B_l
) - in the example that would be the link tableLNK_A_B
itself.Describe alternatives you've considered
The alternatives I can see are:
A
), regardless of the amount of data that needs to be processedAdditional context
The text was updated successfully, but these errors were encountered: