Welcome to the GitHub repository for my Azure Data Engineering project! This repository contains all the code and resources used to transform the Sakila MySQL database into a powerhouse of business intelligence using Azure's cloud computing capabilities.
In this project, I've tackled the challenge of converting raw CSV data from the Sakila database into meaningful insights. The journey involves data ingestion, storage, transformation, and visualization, all within Azure's ecosystem.
- Data Ingestion Scripts: Scripts used with Azure Data Factory to ingest data from Git raw URLs.
- Data Transformation Notebooks: Azure Databricks notebooks containing Spark code for data transformations.
- Visualization Dashboards: Samples or links to PowerBI dashboards created from the processed data.
- Documentation: Detailed explanations of the processes and code.
- Azure Data Factory: For data ingestion.
- Azure Data Lake Gen 2: Used as our primary data storage.
- Azure Databricks: For data processing and transformation.
- PowerBI: For creating insightful visualizations.
- Who are our top 5 most valuable customers?
- Which employees have processed the most orders?
- How do sales trends vary across offices over the years?
- What's the total sales figure for each year?
- Which products are selling the least?