Skip to content

Latest commit

 

History

History
41 lines (32 loc) · 2.35 KB

README.md

File metadata and controls

41 lines (32 loc) · 2.35 KB

YouTube Comments Data Pipeline

This project involves scraping YouTube video comments using the YouTube Data API in Python, saving the data as a CSV file, and uploading it to an S3 bucket. The entire process is automated using an Apache Airflow DAG, which runs daily on an EC2 instance.

Introduction:

This project aims to automate the extraction, transformation, and loading (ETL) of YouTube comments data. The data is scraped using the YouTube Data API, saved as a CSV file, and uploaded to an S3 bucket. An Apache Airflow DAG orchestrates the entire workflow, ensuring that the process runs daily without manual intervention. The Airflow instance is hosted on an Amazon EC2 instance.

Tech Stack:

Python 3.5+
Apache Airflow
AWS Account with S3 bucket
YouTube Data API key
Amazon EC2 instance

Configuration

  1. Obtained "Developer Key" to access Youtube API. Google have provided detailed documentation on API.

  2. Set up free AWS free tier account and launched an EC2 instance with an appropriate instance type (e.g., t2.micro) and Ubuntu/Debian as the OS.

  3. Connected to EC2 instance and installed all dependancies.

    image

    Commands to install updates:
        i. sudo yum update
        ii. sudo yum install python3-pip
        iii. sudo pip install apache-airflow
        iv. sudo pip install pandas
        v. sudo pip install s3fs

  4. Developed code to fetch comments from youtube video - "Google I/O '24 in under 10 minutes" and used pandas to save the
    data into csv file and finally into s3 bucket.

  5. Created a DAG that runs daily and copied into Airflow directory.

  6. Accessed airflow web interface through 8080 port and triggered the DAG.

Results

The automated data pipeline successfully scraped YouTube comments, processed them, and uploaded to an S3 bucket.

image

Reference

  1. https://www.youtube.com/watch?v=q8q3OFFfY6c&list=PLBJe2dFI4sgvQTNNkI3ETYJgNPR4CBpFd&index=3
  2. https://developers.google.com/youtube/v3/docs/comments
  3. https://www.youtube.com/watch?v=SIm2W9TtzR0