SoundScope uses Kafka for data collection, GCS for temporary storage, DBT for data transformation, and BigQuery for analysis. The pipeline is orchestrated by Airflow and deployed on GCP with Docker and Terraform.
- Apache Kafka: Acts as a messaging queue to handle the real-time data ingestion.
- Apache Spark: Performs stream processing to transform the raw data.
- Data Lake: Stores processed data for further analysis.
- Google Cloud Storage (GCS): Temporary storage for intermediate data.
- DBT (Data Build Tool): Transforms and models the data stored in GCS.
- BigQuery: Performs data analysis and querying.
- Data Studio: Visualizes the analyzed data through interactive dashboards.
- Airflow: Orchestrates the entire pipeline, ensuring smooth data flow and task management.
- Docker and Terraform: Used for containerization and infrastructure as code, respectively, to deploy the pipeline on GCP.