This project contains Google Cloud Functions that streams data from Firestore to BigQuery in real-time. It also includes utility functions to manage tables and table schemas in BigQuery using command lines.
- Real-time synchronization of Firestore changes to a buffer dataset in BigQuery.
- Batched updates to target tables in BigQuery at defined intervals (e.g. every 30 minutes).
- Error handling and logging for BigQuery operations.
localRun.js
can run directly, or accept arguments from command line, to use local utility functions for manual managing Firestore and BigQuery data.
- A Google Cloud project with Firestore and BigQuery enabled.
- Node.js and npm installed locally.
- The
gcloud
CLI installed and authenticated.
git clone https://github.com/episphere/stream-Firestore-to-BigQuery.git
cd stream-Firestore-to-BigQuery
npm install
Check the settings.js
file for any necessary configurations, such as target dataset name, buffer dataset name, error and warning table names, collection names to be tracked.
Check the tableSchemas.js
file for the schemas of the target tables. Adjust the schemas as needed.
Create buffer tables for an environment (e.g. dev, prod). Defined dataset name and table schemas are used in this step.
node localRun.js --entry createAllBufferTables --gcloud --env dev
Create target tables for an environment (e.g. dev, prod)
node localRun.js --entry createAllTargetTables --gcloud --env dev
Create error and warning tables for an environment (e.g. dev, prod)
node localRun.js --entry createLogTables --gcloud --env dev
gcloud functions deploy stream-firestore-updates \
--source=. \
--gen2 \
--runtime=nodejs22 \
--entry-point=streamFirestoreUpdates \
--ingress-settings=internal-only \
--region=us-central1 \
--trigger-location=nam5 \
--trigger-event-filters=type=google.cloud.firestore.document.v1.written \
--trigger-event-filters=database='(default)' \
--memory=1Gi \
--cpu=1 \
--timeout=300s \
--concurrency=80
The stream-firestore-updates
function is triggered by Firestore write events. It streams the changes to buffer dataset (default name firstore_stream_buffer
) in BigQuery.
gcloud functions deploy sync-batched-updates-to-tables \
--gen2 \
--trigger-http \
--region=us-central1 \
--runtime=nodejs22 \
--source=. \
--entry-point=syncBatchedUpdatesToTables \
--ingress-settings=internal-only
The sync-batched-updates-to-tables
function is responsible for merging the buffered data into the target tables in dataset (default name firestore_streram
).
HTTP requests to this function can be scheduled using Cloud Scheduler or triggered manually.