Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Allow partial loads for effectivity satellites #327

Open
mried-infomotion opened this issue Feb 21, 2025 · 0 comments
Open

[FEATURE] Allow partial loads for effectivity satellites #327

mried-infomotion opened this issue Feb 21, 2025 · 0 comments
Labels
enhancement New feature or request

Comments

@mried-infomotion
Copy link

What additional value does this feature bring to the project?

It enables delta/partial loads for effectivity satellites - at least under certain, but frequent conditions.

Is your feature request related to a problem? Please describe.

Currently, there is the big restriction on effectivity satellites that they can only be used when the stage model contains all hashkeys that should remain active.

However, when using an effectivity satellite i.e. on a Link Table, this means that all sources used for the link must provide a full load at any time. For large sources this can be prohibitive.

Performing only a delta load of changed records will not work: leaving unchanged source records out of the stage table will mark the link entries derived from those as "inactive".

Describe the solution you'd like

I'd like to define a "driving (hash-)key" in addition to the tracked_hashkey that relaxes the "full load" requirement somewhat:

When defining a driving hashkey, only those tracked hashkeys would be eligible for being marked as inactive whose driving hashkey actually appears in the source model / stage table.

A typical case would be a raw data vault with two source tables "A" and "B", with "A" holding a foreign key to "B".
In data vault this would result in a link LNK_A_B with hashkey hk_A_B_l between two entities hk_A_h and hk_B_h.

By designating hk_A_h as the "driving hashkey", this solution would allow for partial loads of source table "A" while still properly tracking which references between A and B are active at any given time.

Note that this may also require designating a "reference model" for the partial load that can be used to lookup the driving hashkey (hk_A_h) for each tracked hashkey ( hk_A_B_l) - in the example that would be the link table LNK_A_B itself.

Describe alternatives you've considered

The alternatives I can see are:

  • Doing a full load of the source table (i.e. A), regardless of the amount of data that needs to be processed
  • Not using the existing effectivity satellite macro but writing a custom one

Additional context

@mried-infomotion mried-infomotion added the enhancement New feature or request label Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant