YouTube Comments Data Pipeline

This project involves scraping YouTube video comments using the YouTube Data API in Python, saving the data as a CSV file, and uploading it to an S3 bucket. The entire process is automated using an Apache Airflow DAG, which runs daily on an EC2 instance.

Introduction:

This project aims to automate the extraction, transformation, and loading (ETL) of YouTube comments data. The data is scraped using the YouTube Data API, saved as a CSV file, and uploaded to an S3 bucket. An Apache Airflow DAG orchestrates the entire workflow, ensuring that the process runs daily without manual intervention. The Airflow instance is hosted on an Amazon EC2 instance.

Tech Stack:

Python 3.5+
Apache Airflow
AWS Account with S3 bucket
YouTube Data API key
Amazon EC2 instance

Configuration

Obtained "Developer Key" to access Youtube API. Google have provided detailed documentation on API.
Set up free AWS free tier account and launched an EC2 instance with an appropriate instance type (e.g., t2.micro) and Ubuntu/Debian as the OS.
Connected to EC2 instance and installed all dependancies.

Commands to install updates:
    i. sudo yum update
    ii. sudo yum install python3-pip
    iii. sudo pip install apache-airflow
    iv. sudo pip install pandas
    v. sudo pip install s3fs
Developed code to fetch comments from youtube video - "Google I/O '24 in under 10 minutes" and used pandas to save the
data into csv file and finally into s3 bucket.
Created a DAG that runs daily and copied into Airflow directory.
Accessed airflow web interface through 8080 port and triggered the DAG.

Results

The automated data pipeline successfully scraped YouTube comments, processed them, and uploaded to an S3 bucket.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
youtube_comments.csv		youtube_comments.csv
youtube_dag		youtube_dag
youtube_etl.py		youtube_etl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Comments Data Pipeline

Introduction:

Tech Stack:

Configuration

Results

Reference

About

Releases

Packages

Contributors 2

Languages

vedasree-kommindala/youtube_airflow_pipeline

Folders and files

Latest commit

History

Repository files navigation

YouTube Comments Data Pipeline

Introduction:

Tech Stack:

Configuration

Results

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages