@@ -17,7 +17,6 @@ We will:
17
17
- process the raw staging layer.
18
18
- create a Data Vault with hubs, links and satellites using dbtvault and pre-written models.
19
19
20
-
21
20
## Pre-requisites
22
21
23
22
These pre-requisites are separate from those found on the [ getting started] ( walkthrough.md ) page and will
@@ -37,4 +36,25 @@ be the only necessary requirements you will need to get started with the example
37
36
38
37
!!! note
39
38
We have provided a complete ``` requirements.txt ``` to install with ``` pip install -r requirements.txt ```
40
- as a quick way of getting your Python environment set up. This file includes dbt and comes with the download in the next section.
39
+ as a quick way of getting your Python environment set up. This file includes dbt and comes with the download in the
40
+ next section.
41
+
42
+ ## Performance note
43
+
44
+ Please be aware that table structures are simulated from the TPCH-H dataset. The TPC-H dataset is a static view of data.
45
+
46
+ Only a subset of the data contains dates which allows us to simulate daily feeds. The ``` v_stg_orders ``` orders view is
47
+ filtered by date, unfortunately the ``` v_stg_inventory ``` view cannot be filtered by date, so it ends up being a feed of
48
+ the entire contents of the view each cycle.
49
+
50
+ This means that inventory related hubs links and satellites are populated once during the initial load cycle with
51
+ everything and later cycles insert 0 new records in their left outer joins.
52
+
53
+ As the dataset increases in size, e.g if you run with a larger TPC-H dataset (100, 1000 etc.) then be aware you are
54
+ processing the entire inventory dataset each cycle, which results in unrepresentative load cycle times.
55
+
56
+ Unfortunately it's the nature of the dataset, it will not be that way for other datasets. We will look at additonal
57
+ datasets in the future!
58
+
59
+ If you are feeling adventurous you may disable the inventory feed (``` raw_inventory ``` and child models) to see a more
60
+ accurate representation of performance.
0 commit comments