Game-deals-analytics

Game deals analytics is a tool that allows you to download information about video game deals from different online stores.

After downloading the information, it will be transformed according to the business rules and loaded into a Redshift database to be subsequently analyzed.

Builded With 🛠️

Python Libraries

Deliverable Package 1

The script should extract data in JSON and be able to read the format in a Python dictionary. The delivery involves the creation of an initial version of the table where the data will be loaded later.

Deliverable Package 2

Create a pyspark job that allows you to transform the data and load it into a table in Redshift.

Deliverable Package 2

Automate the extraction and transformation of data using Airflow.

Data extraction

The API selected for the extraction of information is "cheapshark.com", all the documentation is available here: https://apidocs.cheapshark.com/#b9b738bf-2916-2a13-e40d-d05bccdce2ba

Package Deploy📦 and Execution🚀

Folders and files structure

The files to be deployed are in the project folder, the structure of the folders and files is as follows:

docker_images/: Contains the Dockerfiles for the Airflow and Spark images.
docker-compose.yml: Docker compose file for the Airflow and Spark containers.
.env: Environment variables file for the Airflow and Spark containers.
dags/: Cointains the DAGs for Airflow.
- etl_game_deals.py: Pricipal DAG for the ETL process how is executed in Airflow for download, transform and load the data from the API to the Redshift database.
logs/: Folder with the logs of Airflow.
postgres_data/: Folder with the data of Postgres.
scripts/: Folder with the scripts for the ETL process.
- postgresql-42.5.2.jar: Jar file for the JDBC connection to Redshift.
- common.py: Common class for the ETL processes.
- utils.py: Utility functions for the ETL process.
- ETL_Game_Deals.py: Script for the ETL process.

Steps for deploy the project

Clone the repository.
Move to the project folder.
Create a .env file with the next environment variables:

REDSHIFT_HOST=...
REDSHIFT_PORT=5439
REDSHIFT_DB=...
REDSHIFT_USER=...
REDSHIFT_SCHEMA=...
REDSHIFT_PASSWORD=...
REDSHIFT_URL="jdbc:postgresql://${REDSHIFT_HOST}:${REDSHIFT_PORT}/${REDSHIFT_DB}?user=${REDSHIFT_USER}&password=${REDSHIFT_PASSWORD}"
DRIVER_PATH=/tmp/drivers/postgresql-42.5.2.jar

Run the next command line statement for build the images and run the containers.

docker-compose up -d

Once the containers are running, enter to the Airflow web interface in http://localhost:8080/.
On tab Admin -> Connections create a new connection with the following data for Redshift:
- Conn Id: redshift_default
- Conn Type: Amazon Redshift
- Host: host de redshift
- Database: base de datos de redshift
- Schema: esquema de redshift
- User: usuario de redshift
- Password: contraseña de redshift
- Port: 5439
On tab Admin -> Connections create a new connection with the following data for Spark:
- Conn Id: spark_default
- Conn Type: Spark
- Host: spark://spark
- Port: 7077
- Extra: {"queue": "default"}
On tab Admin -> Variables create a new variable with the following data:
- Key: driver_class_path
- Value: /tmp/drivers/postgresql-42.5.2.jar
On tabAdmin -> Variables create a new variable with the following data:
- Key: spark_scripts_dir
- Value: /opt/airflow/scripts
Execute DAG etl_game_deals.

Build the images and run the containers

Clone the repository.
Move to the project folder.
Move to the docker_images folder.
Run the next command line statement for build the images and run the containers.

docker build -t airflow:2.1.2 .
docker build -t spark:3.1.2 .

Change the docker-compose.yml file with the name of the new images.
Run the next command line statement for run the containers.

docker-compose up -d

Database

In the database_scritps folder there are the scripts for creating the tables in redshift in which the information will later be dumped using the transformation scripts.

The database diagram will be as follows:

Develop by

Alvaro Garcia - alvarongg

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
diagrams		diagrams
project		project
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Game-deals-analytics

Builded With 🛠️

Python Libraries

Deliverable Package 1

Deliverable Package 2

Deliverable Package 2

Data extraction

Package Deploy📦 and Execution🚀

Folders and files structure

Steps for deploy the project

Build the images and run the containers

Database

Develop by

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

alvarongg/game-deals-analytics

Folders and files

Latest commit

History

Repository files navigation

Game-deals-analytics

Builded With 🛠️

Python Libraries

Deliverable Package 1

Deliverable Package 2

Deliverable Package 2

Data extraction

Package Deploy📦 and Execution🚀

Folders and files structure

Steps for deploy the project

Build the images and run the containers

Database

Develop by

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages