Introduction

This is a data pipeline that takes in Fitbit data, uploads it a Data Lake (Google Cloud Storage), copies it into a BigQuery database and injects SQL into BigQuery to create a data schema ready for analysis.

Goals

Visualize changes in Fitbit biometrics across time periods
- WHY: my fitbit wellness report has been broken for months and I would like to see how activities impact my health
- Example Use Cases: see the impact of a...
  - fitness routine
  - medication
  - stressful life event
Practice building a scalable data pipeline with best practices and fault tolerance
1. develop a schema based on the Data API and Analysis needs
2. incrementally read each technology's documentation and build pipeline
3. implement pipeline steps with key metrics in mind
4. learn from feedback from the course peer review process
Meet submission deadlines for the DataTalks.club 2025 Course Schedule

Results - Overview

OPTIONALLY Download Your Fitbit Data – Retrieves biometric data from the Fitbit API and stores it in JSON format.
Flattens JSON Tables and Converts to Parquet – Transforms the JSON files into Parquet format for optimized storage, transmission, and processing.
Upload Fitbit Data to a Google Cloud Storage Bucket – Transfers the Parquet files to GCS, utilizing the bucket as a Data Lake.
Create BigQuery Heart Rate, Sleep and Profile Tables – Creates external BigQuery tables for user profiles, sleep and heart rate data stored in GCS.
Partitions and Transforms Data in BigQuery – Using DBT, inject SQL-based transformations to BigQuery to clean, standardize, and prepare data for analysis.

Looker Studio Data Presentation

Technologies Used

Python to connect and download from the Fitbit API and reformat the downloaded json files to parquet
Apache Airflow orchestrates and schedules download, reformatting, upload, database transfer and SQL transformation.
PostgreSQL provides Airflow a database to store workflow metadata about DAGs, tasks, runs, and other elements
Google BigQuery to process data analytics. Table partitioning is done in the dbt staging process
dbt (Data Build Tool) injects SQL data transformations into BigQuery. Keeping SQL externally allows version control better to better maintain SQL code.
Docker encapsulates the pipeline ensuring portability.

Setup and Deploy on Google Cloud

1. Requirements

Terraform, Google Cloud Platform Project, Google Cloud CLI

2. Setup a Service Account for a Google Cloud Project

create a service account and download a .json key file
1. GCP Dashboard -> IAM & Admin > Service accounts > Create service account
2. set a name & Leave all other fields with default values -> Create and continue
3. Grant the Viewer role (Basic > Viewer) -> Continue -> Done
4. 3 dots below Actions -> Manage keys -> Add key -> Create new key -> JSON -> Create
Add Cloud Storage & BigQuery permissions to your Service Account
1. find your service account at IAM Cloud UI
2. use +Add another role to add these roles
  - Storage Admin
  - Storage Object Admin
  - BigQuery Admin
  - Viewer
3. Enable the IAM API
4. Enable the IAM Service Account Credentials API
Add Compute VM permissions to your Service Account
1. find your service account at IAM Cloud UI
2. use +Add another role to add these roles
  - Compute Instance Admin
  - Compute Network Admin
  - Compute Security Admin
3. Enable the Compute Engine API

3. Prepare Terraform to Launch The Project

3-a. Clone this Project Locally

git clone https://github.com/MichaelSalata/compare-my-biometrics.git

3-b. Create your SSH key

ssh-keygen -t rsa -b 2048 -C "[email protected]"

3-c. Fill Out `terraform/terraform.tfvars`

NOTE: pick a unique gcs_bucket_name like projectName-fitbit-bucket

credentials          = "/path/to/service_credentials.json"
project              = "google_project_name"
gcs_bucket_name      = "UNIQUE-google-bucket-name"
ssh_user = "[email protected]"
public_ssh_key_path = "~/path/to/id_rsa.pub"
private_ssh_key_path = "~/path/to/id_rsa"

example

credentials          = "/home/michael/.google/credentials/google_credentials.json"
project              = "dtc-de-1287361"
gcs_bucket_name      = "dtc-de-1287361-fb-bucket"
ssh_user             = "michael"
public_ssh_key_path  = "~/.ssh/id_rsa.pub"
private_ssh_key_path = "~/.ssh/id_rsa"

OPTIONAL: Use YOUR Fitbit Data

Alternatively, you can run the example DAG(parallel_backfill_fitbit_example_data) which uses my example fitbit data spanning 11-21-2024 to 3-16-2025

4. Launch with Terraform

cd ./terraform
terraform init
terraform apply

5. Launch the DAG from Airflow's Webserver

OPTION 1:

run bash ./setup_scripts/visit_8080_on_vm.sh

OPTION 2:

get your Compute Instance's External IP in your Google VM instances
visit External IP:8080
choose and run the appropriate DAG

6. Close Down Resources

Ctrl+C will stop Terraform running
These commands will destroy the resources the your services account provisioned

cd ./terraform
terraform destroy

Special Thanks

Thanks to Alexey, Manuel and the Datatalks Club community. Their Data Engineering Course was instrumental in creating this project.

Future Goals

get the project hosted in the cloud ✅ 2025-04-07
make Idempotent
implement CI/CD
handle secure user data with Airflow Secrets

Name		Name	Last commit message	Last commit date
Latest commit History 308 Commits
airflow-gcp		airflow-gcp
dbt_resources		dbt_resources
imgs		imgs
setup_scripts		setup_scripts
terraform		terraform
.gitignore		.gitignore
README.md		README.md
Use-Your-Fitbit-Data.md		Use-Your-Fitbit-Data.md
dbt_profiles.yml		dbt_profiles.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Goals

Results - Overview

Looker Studio Data Presentation

Technologies Used

Setup and Deploy on Google Cloud

1. Requirements

2. Setup a Service Account for a Google Cloud Project

3. Prepare Terraform to Launch The Project

3-a. Clone this Project Locally

3-b. Create your SSH key

3-c. Fill Out `terraform/terraform.tfvars`

OPTIONAL: Use YOUR Fitbit Data

4. Launch with Terraform

5. Launch the DAG from Airflow's Webserver

6. Close Down Resources

Special Thanks

Future Goals

About

Uh oh!

Releases

Packages

Uh oh!

Languages

MichaelSalata/compare-my-biometrics

Folders and files

Latest commit

History

Repository files navigation

Introduction

Goals

Results - Overview

Looker Studio Data Presentation

Technologies Used

Setup and Deploy on Google Cloud

1. Requirements

2. Setup a Service Account for a Google Cloud Project

3. Prepare Terraform to Launch The Project

3-a. Clone this Project Locally

3-b. Create your SSH key

3-c. Fill Out terraform/terraform.tfvars

OPTIONAL: Use YOUR Fitbit Data

4. Launch with Terraform

5. Launch the DAG from Airflow's Webserver

6. Close Down Resources

Special Thanks

Future Goals

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

3-c. Fill Out `terraform/terraform.tfvars`

Packages