Skip to content

Welcome to the GitHub repository for my Azure Data Engineering project! This repository contains all the code and resources used to transform the Sakila MySQL database into a powerhouse of business intelligence using Azure's cloud computing capabilities.

Notifications You must be signed in to change notification settings

sarmadafzalj/AzureDataEngineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Azure Data Engineering Project: Insightful Analytics with Sakila Database

Welcome to the GitHub repository for my Azure Data Engineering project! This repository contains all the code and resources used to transform the Sakila MySQL database into a powerhouse of business intelligence using Azure's cloud computing capabilities.

Project Overview

In this project, I've tackled the challenge of converting raw CSV data from the Sakila database into meaningful insights. The journey involves data ingestion, storage, transformation, and visualization, all within Azure's ecosystem.

alt text

What's Inside:

  • Data Ingestion Scripts: Scripts used with Azure Data Factory to ingest data from Git raw URLs.
  • Data Transformation Notebooks: Azure Databricks notebooks containing Spark code for data transformations.
  • Visualization Dashboards: Samples or links to PowerBI dashboards created from the processed data.
  • Documentation: Detailed explanations of the processes and code.

Tech Stack

  • Azure Data Factory: For data ingestion.
  • Azure Data Lake Gen 2: Used as our primary data storage.
  • Azure Databricks: For data processing and transformation.
  • PowerBI: For creating insightful visualizations.

Key Questions Answered

  1. Who are our top 5 most valuable customers?
  2. Which employees have processed the most orders?
  3. How do sales trends vary across offices over the years?
  4. What's the total sales figure for each year?
  5. Which products are selling the least?

Snapshot of ADF pipeline

alt text

Snapshot of PowerBI dashboard

alt text

Contribute

Feel free to fork this repository, experiment with the code, and suggest improvements! If you have any questions or feedback, don't hesitate to open an issue or submit a pull request.

About

Welcome to the GitHub repository for my Azure Data Engineering project! This repository contains all the code and resources used to transform the Sakila MySQL database into a powerhouse of business intelligence using Azure's cloud computing capabilities.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published