Welcome to the Hours With Experts Labs Repo

This repository will be the central location for the hands-on programming component of the course.

Course Work Overview

The goal of the course is to build an end-to-end data pipeline processing Amazon reviews.

The data pipeline you construct will look like below:

Repo Overview

Week 1 - Environment Setup - Configure your environment to begin the programming course work
Week 2 - Spark SQL - write a Python Spark application to analyze local Amazon review data
Week 3 - Write to Amazon S3 - the program will now connect to Amazon S3 and write data to the storage
Week 4 - Kafka + Bronze layer - read from Kafka instead of the local file, and use Spark structured streaming to be output to Amazon S3 creating the Bronze layer
Week 5 - Silver layer - transform and enrich data from the Bronze layer, creating the Silver layer
Week 6 - Gold layer - define a schema for the silver layer, streams the data from the silver layer, transforms the data, and establishes the gold layer
TODO: Week 7 BI

Important Course Resources

Course resources on the Thinkific platform

Continued Learning

Want to continue your learning in Data Engineering? Great -- check out these links:

STL Big Data - Innovation, Data Engineering, Analytics Group A meetup for users of Big Data services and tools in the Saint Louis Area. We are interested in Innovation (new tools, techniques, and services), Data Engineering (architecture and design of data movement systems), and Analytics (converting information into meaning). (with Kit Menke and Matt Harris)
Data Engineering Podcast This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Some Previous "STL Big Data - I.D.E.A" Meetups

Apache Iceberg Presentation - August 2023

LakeFS Presentation - June 2023

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
resources		resources
week1_welcome		week1_welcome
week2_sql		week2_sql
week3_python		week3_python
week4_kafka_bronze		week4_kafka_bronze
week5_silver		week5_silver
week6_gold		week6_gold
week7_bi		week7_bi
.gitignore		.gitignore
README.md		README.md
sample.env		sample.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to the Hours With Experts Labs Repo

Course Work Overview

Repo Overview

Important Course Resources

Continued Learning

Some Previous "STL Big Data - I.D.E.A" Meetups

About

Releases

Packages

Contributors 6

Languages

1904labs/hwe-labs

Folders and files

Latest commit

History

Repository files navigation

Welcome to the Hours With Experts Labs Repo

Course Work Overview

Repo Overview

Important Course Resources

Continued Learning

Some Previous "STL Big Data - I.D.E.A" Meetups

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages