This repository demonstrates the scripts to design and implementation of a dimensional data warehouse OLAP built from a distributed Online Transaction Processing OLTP system in SQL Server (the famous database AdventureWorks2016 by Microsoft and expose proper metrics in PowerBI. The project focuses on transforming the OLTP data into a star schema as a parquet files optimized for analytical queries in Azure Data Lake Storage (ADLS).
git clone https://github.com/AmmarSahyoun/Dimensional-Data-Warehouse.git
- Source: Distributed OLTP system in SQL Server.
- Target: Dimensional data warehouse in Parquet format within Azure Data Lake Storage (ADLS).
- ETL Process:
- Data extraction from SQL Server tables.
- Transformation into a dimensional star schema.
- Data loading into Parquet files within ADLS.
- Data Model:
- Snowflake schema in the OLTP system (visual representation included).
- Star schema in the data warehouse (visual representation included).
- Fact table with composite primary key (orderId, ItemId).
- Reporting: Power BI report showcasing revenue analysis by region, year, month, and product category (visual representation included).
- Copy pipeline: Transfers data from SQL Server tables to Parquet files in ADLS.
- Dimension table generation: TSQL script for creating the date dimension table.
- Star schema definition: TSQL queries defining the dimensional star schema tables.
- Ad-hoc queries: Sample TSQL queries for data exploration within the data warehouse.
- Stored Procedures and Views: Scripts for advanced data analysis and manipulation.
- Data Quality Checks: Scripts for ensuring data integrity and accuracy.
- Improved performance for analytical queries compared to the OLTP system.
- Easier data exploration and discovery through the dimensional model.
- Ability to create reports and dashboards using Power BI.