Skip to content

Latest commit

 

History

History
394 lines (296 loc) · 14.2 KB

design.asciidoc

File metadata and controls

394 lines (296 loc) · 14.2 KB

postgres Operator Design

Reference Architecture

So, what does the Postgres Operator actually deploy when you create a cluster?

OperatorReferenceDiagram

On this diagram, objects with dashed lines are things that are optionally deployed as part of a Postgres Cluster by the Operator and objects with solid lines are the fundamental and required components.

For example, within the Primary Deployment, the metrics container is completely optional, you can deploy that component using Operator configuration or command line arguments if you want to cause metrics to be collected from the Postgres container.

Replica deployments are just like the Primary deployment but are optional. You do not have to create a Replica at all unless you need that capability. As you scale up your Postgres cluster, the standard set of components gets deployed and replication to the Primary is started.

Lastly, in a future release, you will be able to optionally deploy a Pgpool service and Deployment that can act as a SQL router to your Postgres cluster.

Notice that each cluster deployment gets its own unique Persistent Volumes. Each volume can use different storage configurations which is quite powerful.

Custom Resource Definitions

Kubernetes Custom Resource Definitions are used in the design of the postgres Operator to define:

  • Cluster - pgclusters

  • Backup - pgbackups

  • Upgrade - pgupgrades

  • Policy - pgpolicies

  • Tasks - pgtasks

A PostgreSQL Cluster is made up of multiple Deployments and Services. Optionally you can add a metrics collection container to your database pods.

Command Line Interface

The pgo command line interface (CLI) is used by a normal end-user to create databases or clusters, or make changes to existing databases.

The CLI interacts interacts with the apiserver REST API deployed within the postgres-operator Deployment.

From the CLI, users can view existing clusters that were deployed using the CLI and Operator. Objects that were not created by the Crunchy Operator are now viewable from the CLI.

Operator Deployment

The postgres Operator runs within a Deployment in the Kubernetes cluster. An administrator will deploy the postgres Operator Deployment using the provided script. Once installed and running, the Operator pod will start watching for certain defined events.

The operator watches for create/update/delete actions on the pgcluster custom resource definitions. When the CLI creates for example a new pgcluster custom resource definition, the operator catches that event and creates pods and services for that new cluster request.

CLI Design

The CLI uses the cobra package to implement CLI functionality like help text, config file processing, and command line parsing.

The pgo client is essentially a REST client which communicates to the pgo-apiserver REST server running within the Operator pod. In some cases you might want to split the apiserver out into its own Deployment but the default deployment has a consolidated pod that contains both the apiserver and operator containers simply for convenience of deployment and updates.

Verbs

A user works with the CLI by entering verbs to indicate what they want to do, as follows:

pgo show cluster all
pgo delete cluster db1 db2 db3
pgo create cluster mycluster

In the above example, the show, backup, delete, and create verbs are used. The CLI is case sensitive and supports only lowercase.

Affinity

You can have the Operator add an affinity section to a new Cluster Deployment if you want to cause Kube to attempt to schedule a Primary cluster to a specific Kube node.

You can see the nodes on your Kube cluster by:

kubectl get nodes

You can then specify one of those names (e.g. kubeadm-node2) when creating a cluster:

pgo create cluster thatcluster --node-name=kubeadm-node2

The affinity rule inserted in the Deployment will used a preferred strategy so that if the node were down or not available, Kube would go ahead and schedule the Pod on another node.

You can always view the actual node your cluster pod is scheduled on by:

kubectl get pod -o wide

When you scale up a Cluster and add a replica, the scaling will take into account the use of ``--node-name''. If it sees that a cluster was created with a specific node name, then the replica Deployment will add an affinity rule to attempt to schedule the replica on a different node than the node the primary is schedule on. This gets you a simple for of High Availability so that your primary and replicas will not live on the same Kube node.

Debugging

To see if the operator pod is running enter the following:

kubectl get pod -l 'name=postgres-operator'

To verify the operator is running and has deployed the Custom Resources execute the following:

kubectl get crd
NAME                               KIND
pgbackups.cr.client-go.k8s.io      CustomResourceDefinition.v1beta1.apiextensions.k8s.io
pgclusters.cr.client-go.k8s.io     CustomResourceDefinition.v1beta1.apiextensions.k8s.io
pgpolicies.cr.client-go.k8s.io     CustomResourceDefinition.v1beta1.apiextensions.k8s.io
pgpolicylogs.cr.client-go.k8s.io   CustomResourceDefinition.v1beta1.apiextensions.k8s.io
pgupgrades.cr.client-go.k8s.io     CustomResourceDefinition.v1beta1.apiextensions.k8s.io
pgtasks.cr.client-go.k8s.io        CustomResourceDefinition.v1beta1.apiextensions.k8s.io

Persistent Volumes

Currently the operator does not delete persistent volumes by default, it will delete the claims on the volumes. Starting with release 2.4, the Operator will create Jobs that actually run rm commands on the data volumes before actually removing the Persistent Volumes if the user passes a --delete-data flag when deleting a database cluster.

Likewise, if the user passes --delete-backups during cluster deletion a Job is created to remove all the backups for a cluster include the related Persistent Volume.

PostgreSQL Operator Deployment Strategies

This section describes the various deployment strategies offered by the operator. A deployment in this case is the set of objects created in Kubernetes when a custom resource definition of type pgcluster is created. CRDs are created by the pgo client command and acted upon by the postgres operator.

Strategies

To support different types of deployments, the operator supports multiple strategy implementations. Currently there is only a default cluster strategy.

In the future, more deployment strategies will be supported to offer users more customization to what they see deployed in their Kube cluster.

Being open source, users can also write their own strategy!

Specifying a Strategy

In the pgo client configuration file, there is a CLUSTER.STRATEGY setting. The current value of the default strategy is 1. If you don’t set that value, the default strategy is assumed. If you set that value to something not supported, the operator will log an error.

Strategy Template Files

Each strategy supplies its set of templates used by the operator to create new pods, services, etc.

When the operator is deployed, part of the deployment process is to copy the required strategy templates into a ConfigMap (operator-conf) that gets mounted into /operator-conf within the operator pod.

The directory structure of the strategy templates is as follows:

|-- backup-job.json
|-- cluster
|   |-- 1
|       |-- cluster-deployment-1.json
|       |-- cluster-replica-deployment-1.json
|       |-- cluster-service-1.json
|
|-- pvc.json

In this structure, each strategy’s templates live in a subdirectory that matches the strategy identifier. The default strategy templates are denoted by the value of 1 in the directory structure above.

If you add another strategy, the file names must be unique within the entire strategy directory. This is due to the way the templates are stored within the ConfigMap.

Default Cluster Deployment Strategy (1)

Using the default cluster strategy, a cluster when created by the operator will create the following on a Kube cluster:

  • deployment running a Postgres primary container with replica count of 1

  • service mapped to the primary Postgres database

  • service mapped to the replica Postgres database

  • PVC for the primary will be created if not specified in configuration, this assumes you are using a non-shared volume technology (e.g. Amazon EBS), if the CLUSTER.PVC_NAME value is set in your configuration then a shared volume technology is assumed (e.g. HostPath or NFS), if a PVC is created for the primary, the naming convention is clustername-pvc where clustername is the name of your cluster.

If you want to add a Postgres replica to a cluster, you will scale the cluster, for each replica-count, a Deployment will be created that acts as a Postgres replica.

This is very different than using say a StatefulSet to scale up Postgres. Why would I do it this way? Imagine a case where you want different parts of your Postgres cluster to use different storage configurations, I can do that by doing specific placement and deployments of each part of the cluster.

This same concept applies to node selection for your Postgres cluster components. The Operator will let you define precisely which node you want each Postgres component to be placed upon using node affinity rules.

Cluster Deletion

When you run the following:

pgo delete cluster mycluster

The cluster and its services will be deleted. However the data files and backup files will remain, same with the PVCs for this cluster, they all remain.

However, to remove the data files from the PVC you can pass a flag:

--delete-data

which will cause a workflow to be started to actually remove the data files on the primary cluster deployment PVC.

Also, if you pass a flag:

--delete-backups

it will cause all the backup files to be removed.

The data removal workflow includes the following steps:

  • create a pgtask CRD to hold the PVC name and cluster name to be removed

  • the CRD is watched, and on an ADD will cause a Job to be created that will run the rmdata container using the PVC name and cluster name as parameters which determine the PVC to mount, and the file path to remove under that PVC

  • the rmdata Job is watched by the Operator, and upon a successful status completion the actual PVC is removed

This workflow insures that a PVC is not removed until all the data files are removed. Also, a Job was used for the removal of files since that can be a time consuming task.

The files are removed by the rmdata container which essentially issues the following command to remove the files:

rm -rf /pgdata/<some path>

Custom Postgres Configurations

Starting in release 2.5, users and administrators can specify a custom set of Postgres configuration files be used when creating a new Postgres cluster. The configuration files you can change include:

  • postgresql.conf

  • pg_hba.conf

  • setup.sql

Different configurations for Postgres might be defined for the following:

  • OLTP types of databases

  • OLAP types of databases

  • High Memory

  • Minimal Configuration for Development

  • Project Specific configurations

  • Special Security Requirements

Global ConfigMap

If you create a configMap called pgo-custom-pg-config with any of the above files within it, new clusters will use those configuration files when setting up a new database instance. You do NOT have to specify all of the configuration files, its up to your use case which ones to use.

An example set of configuration files and a script to create the global configMap is found at:

$COROOT/examples/custom-config

If you run the create.sh script there, it will create the configMap that will include the Postgres configuration files within that directory.

Config Files Purpose

The postgresql.conf file is the main Postgresql configuration file allowing you to define a wide variety of tuning parameters and features.

The pg_hba.conf file is the way Postgresql secures down client access.

The setup.sql file is a Crunchy Container Suite configuration file used to initially populate the database after the initial initdb is run when the database is first created. You would make changes to this if you wanted to define what database objects always are created.

Granular Config Maps

So, lets say you want to use a different set of configuration files for different clusters instead of having just a single configuration (e.g. Global Config Map). You can create your own set of ConfigMaps with their own set of Postgresql configuration files. When creating new clusters, you can pass a --custom-config flag along with the name of your ConfigMap and that will be used for that specific cluster or set of clusters.

Default

Lets say you are happy with the default Postgresql configuration files that ship with the Crunchy Postgres container. You don’t have to do anything essentially, just keep using the Operator as normal. Just be sure to not define a global configMap or pass the command line flag.

Labeling

You will notice that when a custom configMap is used in cluster creation, the Operator labels the primary Postgres Deployment with a label that hase a custom-config label and a value of what configMap was used when creating the database.

Commands coming in future releases will take advantage of this labeling.

Metrics Collection

If you add a --metrics flag to pgo create cluster it will cause the crunchy-collect container to be added to your Postgres cluster.

That container requires you run the crunchy-metrics containers as defined within the crunchy-containers project.

The prometheus push gateway that is deployed as part of the crunchy-metrics example is a current requirement for the metrics solution. This will change in an upcoming release of the crunchy-containers project and there will no longer be a requirement for the push gateway to be deployed.