Skip to content

Real-time streaming data analytics using Kafka, Google Cloud Storage (GCS), and BigQuery.

Notifications You must be signed in to change notification settings

Aayan107/SoundScope

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundScope: Unveiling the Beat of Data

SoundScope uses Kafka for data collection, GCS for temporary storage, DBT for data transformation, and BigQuery for analysis. The pipeline is orchestrated by Airflow and deployed on GCP with Docker and Terraform.


Architecture

Streamline Architecture

  • Apache Kafka: Acts as a messaging queue to handle the real-time data ingestion.
  • Apache Spark: Performs stream processing to transform the raw data.
  • Data Lake: Stores processed data for further analysis.
  • Google Cloud Storage (GCS): Temporary storage for intermediate data.
  • DBT (Data Build Tool): Transforms and models the data stored in GCS.
  • BigQuery: Performs data analysis and querying.
  • Data Studio: Visualizes the analyzed data through interactive dashboards.
  • Airflow: Orchestrates the entire pipeline, ensuring smooth data flow and task management.
  • Docker and Terraform: Used for containerization and infrastructure as code, respectively, to deploy the pipeline on GCP.

Real-time data analytics Dashboard in Looker Studio

Real-time data analytics Dashboard in Looker Studio

Kafka Console

Kafka Console

Airflow Console

Airflow Console

Author

About

Real-time streaming data analytics using Kafka, Google Cloud Storage (GCS), and BigQuery.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published