diff --git a/Dockerfile b/Dockerfile index 818eae4..3083c79 100644 --- a/Dockerfile +++ b/Dockerfile @@ -8,6 +8,7 @@ WORKDIR /usr/src/dbt/dbt_project # Install the dbt Postgres adapter. This step will also install dbt-core RUN pip install --upgrade pip RUN pip install dbt-postgres==1.3.1 +RUN pip install pytz # Install dbt dependencies (as specified in packages.yml file) # Build seeds, models and snapshots (and run tests wherever applicable) diff --git a/README.md b/README.md index 801816a..1db38c9 100644 --- a/README.md +++ b/README.md @@ -8,34 +8,21 @@ The `docker-compose.yml` file consists of two services: that are used to build the data models defined in the example project into a target Postgres database. ## `postgres` service and the Sakila Database -This is an instance of a Postgres database initialised with Sakila database (and thus we are using the -`frantiseks/postgres-sakila` image which is available on Docker Hub). +This is an instance of a Postgres database initialised with Sakila database (and thus we are using the +`frantiseks/postgres-sakila` image which is available on Docker Hub). -The database models a DVD rental store and contains several normalised tables that correspond to films, payments, +The database models a DVD rental store and contains several normalised tables that correspond to films, payments, customers and other entities. -Sakila Database was developed by Mike Hillyer, who used to be a member of the AB documentation team at MySQL. For more -information regarding Sakila Database you can refer to the -[official MySQL documentation](https://dev.mysql.com/doc/sakila/en/sakila-introduction.html). +Sakila Database was developed by Mike Hillyer, who used to be a member of the AB documentation team at MySQL. For more +information regarding Sakila Database you can refer to the +[official MySQL documentation](https://dev.mysql.com/doc/sakila/en/sakila-introduction.html). +![Sakila DB](https://www.jooq.org/img/sakila.png) + ## `dbt` service This service is built out of the `Dockerfile` and is responsible for creating dbt seeds, models and snapshots -on `postgres` service. The example dbt project contains seeds, models (staging, intermediate and mart) as well as -snapshots. - -Note that this is a dummy project, meaning that some entities (including aggregations) might not make too much sense -from a business perspective. For example, even though the Sakila database contains the `customer` table already, we -construct another table called `customer_base` that corresponds to a dbt seed, and is loaded form an external -`csv` file. - -Additionally, the models created may not be the perfect examples of what it should be considered as an intermediate or -mart model. In general if you are interested in gaining a deeper understanding of these terms I would encourage you to -read the following articles: -- [Staging vs Intermediate vs Mart models in dbt](https://towardsdatascience.com/staging-intermediate-mart-models-dbt-2a759ecc1db1) -- [How to structure your dbt project and data models](https://towardsdatascience.com/dbt-models-structure-c31c8977b5fc) - -Feel free to add, modify or remove models while cloning or forking the project in order to serve the purpose you -intend to use it for. +snapshots. ## Running the dummy dbt project @@ -45,7 +32,7 @@ First, let's build the services defined in our `docker-compose.yml` file: docker-compose build ``` -and now let's run the services so that the dbt models are created in our target Postgres database: +and now let's run the services so that the dbt models are created in our target Postgres database: ```commandline docker-compose up @@ -56,13 +43,13 @@ This will spin up two containers namely `dbt` (out of the `dbt-dummy` image) and Notes: - For development purposes, both containers will remain up and running -- If you would like to end the `dbt` container, feel free to remove the `&& sleep infinity` in `CMD` command of the +- If you would like to end the `dbt` container, feel free to remove the `&& sleep infinity` in `CMD` command of the `Dockerfile` ### Building additional or modified data models -Once the containers are up and running, you can still make any modifications in the existing dbt project -and re-run any command to serve the purpose of the modifications. +Once the containers are up and running, you can still make any modifications in the existing dbt project +and re-run any command to serve the purpose of the modifications. In order to build your data models, you first need to access the container. @@ -76,41 +63,16 @@ Then enter the running container: docker exec -it /bin/bash ``` -And finally: - -```commandline -# Install dbt deps -dbt deps - -# Build seeds -dbt seeds --profiles-dir profiles - -# Build data models -dbt run --profiles-dir profiles - -# Build snapshots -dbt snapshot --profiles-dir profiles - -# Run tests -dbt test --profiles-dir profiles -``` - -Alternatively, you can run everything in just a single command: - -```commandline -dbt build --profiles-dir profiles -``` - ### Querying seeds, models and snapshots on Postgres -In order to query and verify the seeds, models and snapshots created in the dummy dbt project, simply follow the -steps below. +In order to query and verify the seeds, models and snapshots created in the dummy dbt project, simply follow the +steps below. Find the container id of the postgres service (`postgres`): ```commandline -docker ps +docker ps ``` -Then run +Then run ```commandline docker exec -t /bin/bash ``` @@ -119,22 +81,3 @@ We will then use `psql`, a terminal-based interface for PostgreSQL that allows u ```commandline psql -U postgres ``` - -Now you can query the tables constructed form the seeds, models and snapshots defined in the dbt project: -```sql --- Query seed tables -SELECT * FROM customer_base; - --- Query staging views -SELECT * FROM stg_payment; - --- Query intermediate views -SELECT * FROM int_customers_per_store; -SELECT * FROM int_revenue_by_date; - --- Query mart tables -SELECT * FROM cumulative_revenue; - --- Query snapshot tables -SELECT * FROM int_stock_balances_daily_grouped_by_day_snapshot; -``` diff --git a/dbt_project/.gitignore b/dbt_project/.gitignore index 49f147c..e951f75 100644 --- a/dbt_project/.gitignore +++ b/dbt_project/.gitignore @@ -2,3 +2,4 @@ target/ dbt_packages/ logs/ +.user.yml diff --git a/dbt_project/dbt_project.yml b/dbt_project/dbt_project.yml index 916732b..76535c8 100644 --- a/dbt_project/dbt_project.yml +++ b/dbt_project/dbt_project.yml @@ -1,17 +1,9 @@ - -# Name your project! Project names should contain only lowercase characters -# and underscores. A good package name should reflect your organization's -# name or the intended use of these models name: 'test_dbt_project' version: '1.0.0' config-version: 2 -# This setting configures which "profile" dbt uses for this project. profile: 'test_profile' -# These configurations specify where dbt should look for different types of files. -# The `model-paths` config, for example, states that models in this project can be -# found in the "models/" directory. You probably won't need to change these! model-paths: ["models"] analysis-paths: ["analyses"] test-paths: ["tests"] @@ -19,20 +11,7 @@ seed-paths: ["seeds"] macro-paths: ["macros"] snapshot-paths: ["snapshots"] -target-path: "target" # directory which will store compiled SQL files -clean-targets: # directories to be removed by `dbt clean` +target-path: "target" +clean-targets: - "target" - "dbt_packages" - - -# Configuring models -# Full documentation: https://docs.getdbt.com/docs/configuring-models - -# In this example config, we tell dbt to build all models in the example/ directory -# as tables. These settings can be overridden in the individual model files -# using the `{{ config(...) }}` macro. -#models: -# test_dbt_project: -# # Config indicated by + and applies to all files under models/example/ -# example: -# +materialized: view diff --git a/dbt_project/models/intermediate/_intermediate_models.yml b/dbt_project/models/intermediate/_intermediate_models.yml deleted file mode 100644 index cf2a82a..0000000 --- a/dbt_project/models/intermediate/_intermediate_models.yml +++ /dev/null @@ -1,13 +0,0 @@ -version: 2 - -models: - - name: int_revenue_by_date - - name: int_customers_per_store - columns: - - name: store_id - tests: - - unique - - not_null - - name: total_customers - tests: - - not_null diff --git a/dbt_project/models/intermediate/int_customers_per_store.sql b/dbt_project/models/intermediate/int_customers_per_store.sql deleted file mode 100644 index 2b4d199..0000000 --- a/dbt_project/models/intermediate/int_customers_per_store.sql +++ /dev/null @@ -1,7 +0,0 @@ -SELECT - store_id, - COUNT(*) AS total_customers -FROM - {{ ref('customer_base') }} -GROUP BY - 1 diff --git a/dbt_project/models/intermediate/int_revenue_by_date.sql b/dbt_project/models/intermediate/int_revenue_by_date.sql deleted file mode 100644 index 3d2bf70..0000000 --- a/dbt_project/models/intermediate/int_revenue_by_date.sql +++ /dev/null @@ -1,7 +0,0 @@ -SELECT - DATE(payment_date) AS payment_date, - SUM(amount) AS amount -FROM - {{ ref('stg_payment') }} -GROUP BY - 1 \ No newline at end of file diff --git a/dbt_project/models/marts/_mart_models.yml b/dbt_project/models/marts/_mart_models.yml deleted file mode 100644 index d7b9ab9..0000000 --- a/dbt_project/models/marts/_mart_models.yml +++ /dev/null @@ -1,5 +0,0 @@ -version: 2 - -models: - - name: cumulative_revenue - description: "Cumulative revenue from sales" diff --git a/dbt_project/models/marts/cumulative_revenue.sql b/dbt_project/models/marts/cumulative_revenue.sql deleted file mode 100644 index f1c3091..0000000 --- a/dbt_project/models/marts/cumulative_revenue.sql +++ /dev/null @@ -1,14 +0,0 @@ -{{ - config( - materialized='table', - ) -}} - -SELECT - payment_date, - amount, - SUM(amount) OVER (ORDER BY payment_date) -FROM - {{ ref('int_revenue_by_date') }} -ORDER BY - payment_date \ No newline at end of file diff --git a/dbt_project/models/schema.yml b/dbt_project/models/schema.yml new file mode 100644 index 0000000..e6f2ca7 --- /dev/null +++ b/dbt_project/models/schema.yml @@ -0,0 +1,11 @@ +version: 2 + +models: + - name: my_model_name + description: >- + Description of the model. + + columns: + - name: column_name + description: >- + Description of the column. diff --git a/dbt_project/models/staging/_staging_models.yml b/dbt_project/models/staging/_staging_models.yml deleted file mode 100644 index 4cab2c3..0000000 --- a/dbt_project/models/staging/_staging_models.yml +++ /dev/null @@ -1,13 +0,0 @@ -version: 2 - -models: - - name: stg_payment - description: "Staging model consisting of payment events" - columns: - - name: payment_id - tests: - - unique - - not_null - - name: customer_id - tests: - - not_null diff --git a/dbt_project/models/staging/_staging_sources.yml b/dbt_project/models/staging/_staging_sources.yml deleted file mode 100644 index de6f8bc..0000000 --- a/dbt_project/models/staging/_staging_sources.yml +++ /dev/null @@ -1,6 +0,0 @@ -version: 2 - -sources: - - name: public - tables: - - name: payment diff --git a/dbt_project/models/staging/stg_payment.sql b/dbt_project/models/staging/stg_payment.sql deleted file mode 100644 index 7e13b9c..0000000 --- a/dbt_project/models/staging/stg_payment.sql +++ /dev/null @@ -1,4 +0,0 @@ -SELECT - * -FROM - {{ source('public', 'payment') }} diff --git a/dbt_project/packages.yml b/dbt_project/packages.yml index a86d7e6..6152b33 100644 --- a/dbt_project/packages.yml +++ b/dbt_project/packages.yml @@ -1,3 +1,3 @@ packages: - package: dbt-labs/dbt_utils - version: 1.0.0 + version: 1.1.1 \ No newline at end of file diff --git a/dbt_project/profiles/.user.yml b/dbt_project/profiles/.user.yml new file mode 100644 index 0000000..8a53e9c --- /dev/null +++ b/dbt_project/profiles/.user.yml @@ -0,0 +1 @@ +id: eb8bacf2-ac6f-4e63-a184-7e6a351e42a3 diff --git a/dbt_project/snapshots/_snapshots.yml b/dbt_project/snapshots/_snapshots.yml deleted file mode 100644 index e188402..0000000 --- a/dbt_project/snapshots/_snapshots.yml +++ /dev/null @@ -1,8 +0,0 @@ -version: 2 - -snapshots: - - name: int_customers_per_store_snapshot - columns: - - name: total_customers - tests: - - not_null diff --git a/dbt_project/snapshots/int_customers_per_store_snapshot.sql b/dbt_project/snapshots/int_customers_per_store_snapshot.sql deleted file mode 100644 index 139d81c..0000000 --- a/dbt_project/snapshots/int_customers_per_store_snapshot.sql +++ /dev/null @@ -1,20 +0,0 @@ -{% snapshot int_stock_balances_daily_grouped_by_day_snapshot %} - - {{ - config( - target_schema='public', - strategy='check', - check_cols=['total_customers'], - unique_key='store_id', - invalidate_hard_deletes=True - ) - }} - - SELECT - store_id, - total_customers, - CURRENT_DATE - FROM - {{ ref('int_customers_per_store') }} - -{% endsnapshot %}