Skip to content

TitoLulu/Data-Modelling-With-Postgres

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Modelling with Postgres

ETL Project for a fictional company sparkify. The ETL Pipeline built is aimed at enabling the Analytics team to retrieve information on songs users are listening to. Data sources exists in directories of JSON logs and JSON metadata on Sparkify songs app.

Approach

Created a database schema to match Analytics teams data requirements making use of star schema modelling. Fact table - songplays - log records data associated with songplays

songplay_id start_time user_id level song_id artist_id session_id location user_agent

Dimension tables:

  1. users - app users

    first_name last_name gender level
  2. songs - songs in app database

    song_id title artist_id year duration
  3. artists - artists in app database

    artist_id name location latitude longitude
  4. time - timestamps of records on songplays broken down into specific units

    start_time hour day week month year weekday

Created a python ETL pipeline that retrieves, processes and loads records into the the various tables. create_tables.py --> creates the fact and dimension tables etl.py --> reads and processes files from song_data and log_data and loads them into tables

References

Highly uitilized the web to get an understanding of concepts

-Reference #1:Pandas User Guide

-Reference #2:Selecting a specific value from a dataframe

-Reference #3:Dataframe to list

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published