- Workflow Orchestration: Mage
- Cloud: GCP
- BigQuery
- Docker
- dbt
- Infrastructure as Code (IaC): Terraform
- Visualization: Looker Studio
This project aims to create a robust data pipeline orchestrated by Mage, utilizing various technologies and tools to process and visualize cycling data in the UK.
The primary goal is to apply the knowledge gained from the Data Talks Club Data Engineering Zoomcamp to construct an end-to-end data pipeline.
The task involves developing a dashboard with two tiles using the dataset provided by Transport for London (TfL). Here's the breakdown of the project:
-
Pipeline for Data Processing: Create a pipeline to process the dataset and store it in a datalake on Google Cloud Platform (GCP).
-
Pipeline for Data Warehousing: Develop a pipeline to move the processed data from the datalake to a data warehouse in BigQuery.
-
Data Transformation with dbt: Transform the data within the data warehouse using dbt to prepare it for visualization on the dashboard.
-
Dashboard Visualization: Build a dashboard to visualize the transformed data.
The data pipeline will be implemented as a batch pipeline, processing data in regular intervals to ensure up-to-date insights are available for visualization on the dashboard.
You can read about the project in this article : UK Cycling Data Engineering end to end project).