Skip to content

Commit 6417e85

Browse files
author
Alex Higgs
committed
Merge branch 'master' into releases
2 parents 8bdd40f + 8fe1011 commit 6417e85

21 files changed

+583
-82
lines changed

CONTRIBUTING.md

+8-7
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,16 @@
11
## We'd love to hear from you
22

3-
This dbtvault package is very much a work in progress – we’ll up the version number to 1.0 when we’re satisfied it
4-
works out in the wild.
3+
dbtvault is very much a work in progress – we’re constantly adding quality of life improvements and will be adding
4+
new table types regularly.
55

66
We know that it deserves new features, that the code base can be tidied up and the SQL better tuned.
7-
Rest assured we’re working on it for future releases – our roadmap contains information on what’s coming.
87

9-
If you spot anything you’d like to bring to our attention, have a request for new features,
10-
have spotted an improvement we could make, or want to tell us about a typo, then please don’t hesitate to let us know
11-
by submitting an issue using the below guidelines
8+
Rest assured we’re working on it for future releases – [our roadmap contains information on what’s coming](roadmap.md).
9+
10+
If you spot anything you’d like to bring to our attention, have a request for new features, have spotted an improvement we could make,
11+
or want to tell us about a typo or bug, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues).
1212

13-
We’d rather know you are making active use of this package than hearing nothing from all of you out there!
13+
We’d rather know you are making active use of this package than hearing nothing from all of you out there!
1414

1515
Happy Data Vaulting!
1616

@@ -20,6 +20,7 @@ Happy Data Vaulting!
2020
We've tested the package rigorously, but if you think you've found a bug please provide the following
2121
at a minimum (or use the issue templates) so we can fix it as quickly as possible:
2222

23+
- The version of dbt being used
2324
- The version of dbtvault being used.
2425
- Steps to reproduce the issue
2526
- Any error messages or dbt log files which can give more detail of the problem

README.md

+13-4
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,18 @@
1+
<p align="left">
2+
<img src="https://user-images.githubusercontent.com/25080503/69713956-6249de80-10fd-11ea-8120-413db42d50ac.png">
3+
<p> There will be a live demonstration of dbtvault at the next UK Data Vault User Group on Tuesday, December 3, 2019 @ 6pm in LONDON.
4+
5+
<a href="https://www.meetup.com/UK-Data-Vault-User-Group/events/266604902/">Sign up for FREE now! </a>
6+
</p>
7+
</p>
8+
19
<p align="center">
210
<img src="https://user-images.githubusercontent.com/25080503/65772647-89525700-e132-11e9-80ff-12ad30a25466.png">
311
</p>
412

513
latest [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=latest)](https://dbtvault.readthedocs.io/en/latest/?badge=latest)
614

7-
stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3.3-pre)](https://dbtvault.readthedocs.io/en/v0.3.3-pre/?badge=v0.3.3-pre)
15+
stable [![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.4)](https://dbtvault.readthedocs.io/en/v0.4/?badge=v0.4)
816

917
[past docs versions](https://dbtvault.readthedocs.io/en/latest/changelog/)
1018

@@ -35,20 +43,21 @@ Get started quickly with our worked example:
3543

3644
## Installation
3745

46+
Ensure you are using dbt 0.14 (0.15 support will be added soon!)
3847
Add the following to your ```packages.yml```
3948

4049

4150
```yaml
4251
packages:
4352

4453
- git: "https://github.com/Datavault-UK/dbtvault"
45-
revision: v0.3.3-pre # Latest stable version
54+
revision: v0.4 # Latest stable version
4655
```
4756
4857
And run
4958
```dbt deps```
5059

51-
[Read more on package installation](https://docs.getdbt.com/docs/package-management)
60+
[Read more on package installation](https://docs.getdbt.com/v0.14.0/docs/package-management)
5261

5362
## Usage
5463

@@ -77,4 +86,4 @@ before anyone else!
7786
[View our contribution guidelines](CONTRIBUTING.md)
7887

7988
## License
80-
[Apache 2.0](LICENSE.md)
89+
[Apache 2.0](LICENSE.md)

dbt_project.yml

+5-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: 'dbtvault'
2-
version: '0.3.3'
2+
version: '0.4'
33

44
profile: 'dbtvault'
55

@@ -13,3 +13,7 @@ target-path: "target"
1313
clean-targets:
1414
- "target"
1515
- "dbt_modules"
16+
17+
models:
18+
vars:
19+
hash: MD5

docs/bestpractices.md

+45-3
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,9 @@ then there is a chance of a clash: where two different values generate the same
4343

4444
For this reason, it **should not be** used for cryptographic purposes either.
4545

46-
In future releases of dbtvault, we will allow you to change the algorithm that is used (e.g. to SHA-256) to reduce the
47-
chance of a clash (at the expense of more processing and a larger column), or switch off hashing entirely.
46+
!!! success
47+
48+
You may now choose between MD5 and SHA-256 in dbtvault, [read below](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault).
4849

4950
### Why do we hash?
5051

@@ -80,4 +81,45 @@ staging tables
8081
the sorting functionality for primary keys.
8182

8283
- For **links**, columns must be sorted by the primary key of the hub and arranged alphabetically by the hub name.
83-
The order must also be the same as each hub.
84+
The order must also be the same as each hub.
85+
86+
### Choosing a hashing algorithm in dbtvault
87+
88+
With the release of dbtvault 0.4, you may now choose between ```MD5``` and ```SHA-256``` hashing. ```SHA-256``` was added
89+
to dbtvault as an option for users who wish to reduce the hashing collision rates in larger data sets.
90+
91+
!!! note
92+
93+
If a hashing algorithm configuration is missing or invalid, dbtvault will use ```MD5``` by default.
94+
95+
Configuring the hashing algorithm which will be used by dbtvault is simple: simply add a variable to your
96+
```dbt_project.yml``` as follows:
97+
98+
```dbt_project.yml```
99+
```yaml
100+
101+
name: 'my_project'
102+
version: '1'
103+
104+
profile: 'my_project'
105+
106+
source-paths: ["models"]
107+
analysis-paths: ["analysis"]
108+
test-paths: ["tests"]
109+
data-paths: ["data"]
110+
macro-paths: ["macros"]
111+
112+
target-path: "target"
113+
clean-targets:
114+
- "target"
115+
- "dbt_modules"
116+
117+
models:
118+
vars:
119+
hash: SHA # or MD5
120+
```
121+
122+
It is possible to configure a hashing algorithm on a model-by-model basis using the hierarchical structure of the ```yaml``` file.
123+
We recommend you keep the hashing algorithm consistent across all tables, however, as per best practise.
124+
125+
Read the [dbt documentation](https://docs.getdbt.com/v0.14.0/docs/var) for further information on variable scoping.

docs/changelog.md

+27-1
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,33 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [v0.4] - 2019-11-27
8+
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.4)](https://dbtvault.readthedocs.io/en/v0.4-pre/?badge=v0.4)
9+
10+
### Added
11+
12+
- Table Macros:
13+
- [Transactional Links](macros.md#t_link_template)
14+
15+
### Improved
16+
17+
- Hashing:
18+
- You may now choose between ```MD5``` and ```SHA-256``` hashing with a simple yaml configuration
19+
[Learn how!](bestpractices.md#choosing-a-hashing-algorithm-in-dbtvault)
20+
21+
### Worked example
22+
23+
- Transactional Links
24+
- Added a transactional link model using a simulated transaction feed.
25+
26+
### Documentation
27+
28+
- Updated macros, best practices, roadmap, and other pages to account for new features
29+
- Updated worked example documentation
30+
- Replaced all dbt documentation links with links to the 0.14 documentation as dbtvault
31+
is using dbt 0.14 currently (we will be updating to 0.15 soon!)
32+
- Minor corrections
33+
734
## [v0.3.3-pre] - 2019-10-31
835
[![Documentation Status](https://readthedocs.org/projects/dbtvault/badge/?version=v0.3.3-pre)](https://dbtvault.readthedocs.io/en/v0.3.3-pre/?badge=v0.3.3-pre)
936

@@ -131,7 +158,6 @@ the new and improved features.
131158

132159
### Added
133160

134-
135161
- Table Macros:
136162
- [Hub](macros.md#hub_template)
137163
- [Link](macros.md#link_template)

docs/contributing.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ We know that it deserves new features, that the code base can be tidied up and t
88
Rest assured we’re working on it for future releases – [our roadmap contains information on what’s coming](roadmap.md).
99

1010
If you spot anything you’d like to bring to our attention, have a request for new features, have spotted an improvement we could make,
11-
or want to tell us about a typo, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues).
11+
or want to tell us about a typo or bug, then please don’t hesitate to let us know via [Github](https://github.com/Datavault-UK/dbtvault/issues).
1212

1313
We’d rather know you are making active use of this package than hearing nothing from all of you out there!
1414

@@ -20,6 +20,7 @@ Happy Data Vaulting! :smile:
2020
We've tested the package rigorously, but if you think you've found a bug please provide the following
2121
at a minimum (or use the issue templates) so we can fix it as quickly as possible:
2222

23+
- The version of dbt being used
2324
- The version of dbtvault being used.
2425
- Steps to reproduce the issue
2526
- Any error messages or dbt log files which can give more detail of the problem

docs/hubs.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The following header is what we use, but feel free to customise it to your needs
2626

2727
Hubs are always incremental, as we load and add new records to the existing data set.
2828

29-
[Read more about incremental models](https://docs.getdbt.com/docs/configuring-incremental-models)
29+
[Read more about incremental models](https://docs.getdbt.com/v0.14.0/docs/configuring-incremental-models)
3030

3131
!!! note "Dont worry!"
3232
The [hub_template](macros.md#hub_template) deals with the Data Vault
@@ -39,10 +39,10 @@ Let's look at the metadata we need to provide to the [hub_template](macros.md#hu
3939
#### Source table
4040

4141
The first piece of metadata we need is the source table. This step is easy, as in this example we created the
42-
new staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us.
42+
staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us.
4343
dbt ensures dependencies are honoured when defining the source using a reference in this way.
4444

45-
[Read more about the ref function](https://docs.getdbt.com/docs/ref)
45+
[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref)
4646

4747
```hub_customer.sql```
4848

docs/links.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ The first piece of metadata we need is the source table. This step is easy, as w
3838
staging layer ourselves. All we need to do is provide a reference to the model we created, and dbt will do the rest for us.
3939
dbt ensures dependencies are honoured when defining the source using a reference in this way.
4040

41-
[Read more about the ref function](https://docs.getdbt.com/docs/ref)
41+
[Read more about the ref function](https://docs.getdbt.com/v0.14.0/docs/ref)
4242

4343
```link_customer_nation.sql```
4444

docs/loading.md

+28-2
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,11 @@ This will run all models with the hub tag.
2929
Links are another fundamental component in a Data Vault.
3030

3131
Links model an association or link, between two business keys. They commonly hold business transactions or structural
32-
information.
32+
information. A link specifically contains the structural information.
3333

3434
Our links will contain:
3535

36-
1. A primary key. For links, we take the natural keys (prior to hashing) represented by the foreign key columns below
36+
1. A primary key. For links, we take the natural keys (prior to hashing) represented by the foreign key columns
3737
and create a hash on a concatenation of them.
3838
2. Foreign keys holding the primary key for each hub referenced in the link (2 or more depending on the number of hubs
3939
referenced)
@@ -89,6 +89,32 @@ To compile and load the provided satellite models, run the following command:
8989

9090
This will run all models with the satellite tag.
9191

92+
## Transactional Links
93+
94+
Transactional Links are used to model transactions between entities in a Data Vault.
95+
96+
Links model an association or link, between two business keys. They commonly hold business transactions or structural
97+
information. A transactional link specifically contains the business transactions.
98+
99+
Our transactional links will contain:
100+
101+
1. A primary key. For transactional links, we use the transaction number. If this is not already present in the dataset
102+
then we create this by concatenating the foreign keys and hashing them.
103+
2. Foreign keys holding the primary key for each hub referenced in the transactional link (2 or more depending on the number of hubs
104+
referenced)
105+
3. A payload. This will be data about the transaction itself e.g. the amount, type, date or non-hashed transaction number.
106+
4. An ```EFFECTIVE_FROM``` date. This will usually be the date of the transaction.
107+
5. The load date or load date timestamp.
108+
6. The source for the record
109+
110+
### Loading transactional links
111+
112+
To compile and load the provided t_link models, run the following command:
113+
114+
```dbt run --models tag:t_link```
115+
116+
This will run all models with the t_link tag.
117+
92118
## Loading the full system
93119

94120
Each of the commands above load a particular type of table, however, we may want to do a full system load.

0 commit comments

Comments
 (0)