At PeerDB, we are building a fast, simple and the most cost effective way to stream data from Postgres to a host of Data Warehouses, Queues and Storage Engines. If you are running Postgres at the heart of your data-stack and move data at scale from Postgres to any of the above targets, PeerDB can provide value.
PeerDB was acquired by ClickHouse in July 2024. As part of this acquisition, we're making public the repository that contains the Helm charts used for deploying our Enterprise offering. This will enable people to self-host PeerDB in a more reliable and scaleable manner.
PeerDB itself has 5 main services:
flow-worker: The service that actually runs mirrors and does all the data movement. Written in Golang, source code here.flow-snapshot-worker: Helpsflow-workerperform initial snapshot of mirrors. Needs to be available at all times during this phase of a mirror. Shares source code withflow-worker.flow-api: Hosts the gRPC API that actually creates and manages mirrors.peerdb-uiandpeerdb-serverdepend on this. Shares source code withflow-workerandflow-snapshot-worker.peerdb-ui: Intuitive web UI for interacting with peers and mirrors. Written in Next.js, source code here.peerdb-server: Postgres wire protocol compatible SQL query layer, allows creating peers and mirrors viapsqland other Postgres tooling. Written in Rust, source code here.
For a more detailed overview of PeerDB's architecture, you can look here. Aside from this, PeerDB needs a Postgres database to use as a "catalog" to store configuration, and Temporal for workflow orchestration. Both can either be cloud-based or self-hosted (self-hosted Temporal in turn needs Postgres too), and the charts can be configured according to your needs.
The sections below provide a quick way to get started with using the charts (like a POC). You can jump to the Production Guide post POC (or if you are comfortable enough).
- helm
- kubectl
- yq
- Golang (if you need to setup catalog manually)
- k9s for debugging
psqlif you need to interface withpeerdb-server
- Create a Kubernetes cluster on your favorite cloud provider
- A sample node-pool/node-group for following the quickstart guide can look like:
- Number of nodes: 3 (autoscaling recommended)
- vCores: 8
- Memory: 32GB
- Disk: 300GB
- Architecture: x64/ARM64
- Setup your kubectl to point to the cluster
- Make sure all local dependencies are installed
- Make sure Cluster is setup and kubectl is pointing to the cluster
- Clone this repo and create an
.envfile from.env.template. - Setup In-Cluster Catalog Postgres
- Run
./install_catalog.sh - Run
./test_catalog.sh
- Run
- Install PeerDB
- Update
.envwithPEERDB_PASSWORDandPEERDB_UI_PASSWORD- Also generate a new random string for
PEERDB_UI_NEXTAUTH_SECRETand set it in.env
- Also generate a new random string for
- Run
./install_peerdb.shfor the first time - Set
PEERDB_UI_SERVICE_URLin.envto the DNS/CNAME/IP of the LoadBalancer created and re-run./install_peerdb.shkubectl get service peerdb-ui -n peerdb-nsto get the external IP of the peerdb server, to get theexternal_ipof the PeerDB UI server. (Change the namespace here if you have set a different namespace)- Set the value to
PEERDB_UI_SERVICE_URLin.envashttp://<external_ip>:3000
- Re-run
./install_peerdb.shto update the service with the new DNS/CNAME/IP
- Update
Specific changes can be made to values.customer.yaml for both the peerdb and the peerdb-catalog helm charts.
values.customer.yaml can be backed up as kubernetes secrets. To enable this, set SAVE_VALUES_AS_SECRET=true in the .env
- Deploy postgres as needed.
- Update
.envappropriately with the credentials - Set
CATALOG_DEPLOY_ENABLED=falsein.env -
- If using RDS, enable SSL by setting
PG_RDS_SSL_ENABLED=truein.env. - If using SSL with another provider, set
TEMPORAL_SSL_MODE=truein.env.
- If using RDS, enable SSL by setting
- Run
./install_catalog.sh, this will setup the schema. - Run
./test_catalog.shto verify schema version and permissions are in order
- Set
CATALOG_DEPLOY_ENABLED=truein.env - Run
./install_catalog.sh - Run
./test_catalog.shto verify schema version and permissions are in order once the postgres pods are up
NOTE: PG_PASSWORD will NOT be used from .env and will be auto-generated and can be obtained from the secret "${CATALOG_DEPLOY_CLUSTER_NAME}-pguser-${PG_USER}"
- Set
DATADOG_ENABLED=true - Set the following parameters:
DATADOG_SITE=<Datadog collection site, e.g. us5.datadoghq.com> DATADOG_API_KEY=<Datadog API Key> DATADOG_CLUSTER_NAME=<Datadog Cluster Name, something like customer-name-enterprise >
The following can be set in the .env to set up credentials to access PeerDB
PEERDB_PASSWORD=peerdb
PEERDB_UI_PASSWORD=peerdb
Also set PEERDB_UI_NEXTAUTH_SECRET to a random static string
PEERDB_UI_NEXTAUTH_SECRET=<Randomly-Generated-Secret-String>
- Authentication for PeerDB UI and Temporal WebUI can be enabled by setting the following in
.env:This will disableAUTHENTICATION_ENABLED=true AUTHENTICATION_CREDENTIALS_USERNAME=<username> AUTHENTICATION_CREDENTIALS_PASSWORD=<password>
LoadBalancerfor both the services and instead create a LoadBalancer for the Authentication Proxy. - Once Temporal and PeerDB are installed in the cluster, set/update DNS entries starting with
temporal.,peerdb.andpeerdb-ui.to point to theLoadBalancerIP ofauthentication-proxyservice. - Temporal and PeerDB UI can be accessed through the DNS names set in previous step.
Catalog will automatically be setup (with schema update/migration) using k8s jobs via the helm chart. The jobs might go into a few retries before everything reconciles.
NOTE: Catalog can still be setup/upgraded via ./setup_postgres.sh and ./setup_temporal_schema.sh in case there is an issue.
- Fill in the
TEMPORAL_CLOUD_HOST,TEMPORAL_CLOUD_CERTandTEMPORAL_CLOUD_KEYenvironment variables in .env. - Fill in
PEERDB_DEPLOYMENT_UIDwith an appropriate string to uniquely identify the current deployment.
- Run
./install_peerdb.shto install/upgrade peerdb on the kubernetes cluster. - Run
kubectl get service peerdb-server -n ${PEERDB_K8S_NAMESPACE}to get the external IP of the peerdb server. - Validate that you are able to access temporal-web by:
kubectl port-forward -n ${TEMPORAL_K8S_NAMESPACE} services/${TEMPORAL_RELEASE_NAME}-web 8080:8080
- If enabling service of type LoadBalancer, set
PEERDB_UI_SERVICE_URLin.envto the DNS/CNAME/IP of the LoadBalancer forpeerdb-uiservice created and re-run./install_peerdb.sh. For example:PEERDB_UI_SERVICE_URL=http://aac397508d3594a4494dc9350812c40d-509756028.us-east-1.elb.amazonaws.com:3000
Setting up resources for PeerDB and In-Cluster Catalog is as simple as updating the values.customer.yaml file in the respective charts (peerdb and peerdb-catalog).
peerdb/values.customer.yaml:flowWorker: resources: requests: cpu: 12 memory: 48Gi ephemeral-storage: 384Gi limits: cpu: 16 memory: 64Gi ephemeral-storage: 512Gi replicaCount: 2
- and
peerdb-catalog/values.customer.yaml:deploy: resources: requests: cpu: 2 memory: 8Gi limits: cpu: 2 memory: 8Gi
A production guide setup with examples is available in PRODUCTION.md.
Insecure cookie needs to be enabled to send commands/signals via the Temporal UI over plain HTTP and can be added to peerdb/values.customer.yaml:
temporal-deploy:
web:
additionalEnv:
- name: TEMPORAL_CSRF_COOKIE_INSECURE
value: 'true'