Skip to content

PG-1127 Rewamped HA solution (17) #679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: 17
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
bb9e170
Extended description of architecture
nastena1606 Oct 29, 2024
8f3a3e8
PG-1127 Rewamp HA solution
nastena1606 Nov 21, 2024
5673f82
Split setup into individual components pages
nastena1606 Nov 25, 2024
bcb0094
Reworked the intro
nastena1606 Dec 6, 2024
f86706a
Added diagrams to overview
nastena1606 Dec 11, 2024
f9ab900
Moved components interaction to a separate page
nastena1606 Jan 21, 2025
f5cde85
Updated how components work together
nastena1606 Feb 3, 2025
cbf2a6d
Reworked Components part
nastena1606 Feb 5, 2025
0ca8732
Added Patroni description
nastena1606 Feb 11, 2025
626f125
Markup polishing
nastena1606 Feb 13, 2025
2f4b499
Patroni and pgBackRest config'
nastena1606 Feb 17, 2025
c6d2b03
Fixed commands, added disclaier about datadir cleanup for Patroni
nastena1606 Feb 18, 2025
a740284
keepalived setup
nastena1606 Feb 19, 2025
0d5d349
Added HAproxy description to components
nastena1606 Feb 25, 2025
4a4380a
Added pgBackRest info
nastena1606 Feb 25, 2025
3cd7746
Updated How components work page
nastena1606 Mar 14, 2025
1ffa9b7
Updated images, archtecture with 2 types of diagrams, added 3rd HApro…
nastena1606 Mar 20, 2025
82ede94
Updated diagram with watchdog component
nastena1606 Mar 26, 2025
6b32bc0
Updated Patroni config for 4.0.x versions
nastena1606 Apr 8, 2025
5cc3143
Added Troubleshoot Patroni startup options subsection
nastena1606 May 26, 2025
51481a5
Modified HAProxy and keepalived configuration
nastena1606 Jun 4, 2025
4f9fdf4
Updated after the review. P1
nastena1606 Jun 27, 2025
70b5e20
Merge branch '17' into PG-1127-HA-rewamp-17
nastena1606 Jun 27, 2025
62a10ad
Updated after the review. P2
nastena1606 Jun 30, 2025
c99d038
Restoring the new navigation
nastena1606 Jun 30, 2025
c385b6b
Updated after the review. P3
nastena1606 Jul 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed docs/_images/PPG_links.png
Binary file not shown.
4 changes: 4 additions & 0 deletions docs/_images/diagrams/HA-basic.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/_images/diagrams/ha-architecture-patroni.png
Binary file not shown.
3 changes: 3 additions & 0 deletions docs/_images/diagrams/ha-overview-backup.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/_images/diagrams/ha-overview-failover.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/_images/diagrams/ha-overview-load-balancer.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions docs/_images/diagrams/ha-overview-replication.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/_images/diagrams/ha-recommended.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed docs/_images/diagrams/patroni-architecture.png
Binary file not shown.
10 changes: 10 additions & 0 deletions docs/css/design.css
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,16 @@
--md-typeset-table-color: hsla(var(--md-hue),0%,100%,0.25)
}

[data-md-color-scheme="percona-light"] img[src$="#only-dark"],
[data-md-color-scheme="percona-light"] img[src$="#gh-dark-mode-only"] {
display: none; /* Hide dark images in light mode */
}

[data-md-color-scheme="percona-dark"] img[src$="#only-light"],
[data-md-color-scheme="percona-dark"] img[src$="#gh-light-mode-only"] {
display: none; /* Hide light images in dark mode */
}

/* Typography */

.md-typeset {
Expand Down
65 changes: 65 additions & 0 deletions docs/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,71 @@ Follow these steps to enable `pg_tde`:
CREATE TABLE <table_name> (<field> <datatype>) USING tde_heap;
```

## Enable encryption

Percona Distribution for PostgreSQL Docker image includes the `pg_tde` extension to provide data encryption. You must explicitly enable it when you start the container.

Here's how to do this:
{.power-number}

1. Start the container with the `ENABLE_PG_TDE=1` environment variable:

```{.bash data-prompt="$"}
$ docker run --name container-name -e ENABLE_PG_TDE=1 -e POSTGRES_PASSWORD=sUpers3cRet -d percona/percona-distribution-postgresql:{{dockertag}}-multi
```

where:

* `container-name` is the name you assign to your container
* `ENABLE_PG_TDE=1` adds the `pg_tde` to the `shared_preload_libraries` and enables the custom storage manager
* `POSTGRES_PASSWORD` is the superuser password


2. Connect to the container and start the interactive `psql` session:

```{.bash data-prompt="$"}
$ docker exec -it container-name psql
```

??? example "Sample output"

```{.text .no-copy}
psql ({{dockertag}} - Percona Server for PostgreSQL {{dockertag}}.1)
Type "help" for help.

postgres=#
```

3. Create the extension in the database where you want to encrypt data. This requires superuser privileges.

```sql
CREATE EXTENSION pg_tde;
```

4. Configure a key provider. In this sample configuration intended for testing and development purpose, we use a local keyring provider.

For production use, set up an external key management store and configure an external key provider. Refer to the [Setup :octicons-link-external-16:](https://percona.github.io/pg_tde/main/setup.html#key-provider-configuration) chapter in the `pg_tde` documentation.

<i warning>:material-information: Warning:</i> This example is for testing purposes only:

```sql
SELECT pg_tde_add_key_provider_file('file-keyring','/tmp/pg_tde_test_local_keyring.per');
```

5. Add a principal key

```sql
SELECT pg_tde_set_principal_key('test-db-master-key','file-keyring');
```

The key is autogenerated. You are ready to use data encryption.

6. Create a table with encryption enabled. Pass the `USING tde_heap` clause to the `CREATE TABLE` command:

```sql
CREATE TABLE <table_name> (<field> <datatype>) USING tde_heap;
```

## Enable `pg_stat_monitor`

To enable the `pg_stat_monitor` extension after launching the container, do the following:
Expand Down
18 changes: 18 additions & 0 deletions docs/minor-upgrade.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,24 @@ Minor upgrade of Percona Distribution for PostgreSQL includes the following step

## Procedure

## Before you start

1. [Update the `percona-release` :octicons-link-external-16:](https://www.percona.com/doc/percona-repo-config/percona-release.html#updating-percona-release-to-the-latest-version) utility to the latest version. This is required to install the new version packages of Percona Distribution for PostgreSQL.

2. Starting with version 17.2.1, `pg_tde` is part of the Percona Server for PostgreSQL package. If you installed `pg_tde` from its dedicated package, do the following to avoid conflicts during the upgrade:

* Drop the extension using the `DROP EXTENSION` with `CASCADE` command.

<i warning>:material-alert: Warning:</i> The use of the `CASCADE` parameter deletes all tables that were created in the database with `pg_tde` enabled and also all dependencies upon the encrypted table (e.g. foreign keys in a non-encrypted table used in the encrypted one).

```sql
DROP EXTENSION pg_tde CASCADE
```

* Uninstall the `percona-postgresql-17-pg-tde` package for Debian/Ubuntu or the `percona-pg_tde_17` package for RHEL and derivatives.

## Procedure

Run **all** commands as root or via **sudo**:
{.power-number}

Expand Down
4 changes: 3 additions & 1 deletion docs/release-notes-v17.2.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ This release of Percona Distribution for PostgreSQL is based on Percona Server f
* Percona Distribution for PostgreSQL includes [`pgvector` :octicons-link-external-16:](https://github.com/pgvector/pgvector) - an open source extension that enables you to use PostgreSQL as a vector database. It brings vector data type and vector operations (mainly similarity search) to PostgreSQL. You can install `pgvector` from repositories, tarballs, and it is also available as a Docker image.
* The new version of `pg_tde` extension features index encryption and the support of storing encryption keys in KMIP-compatible servers. These feature come with the Beta version of the `tde_heap` access method. Learn more in the [pg_tde release notes :octicons-link-external-16:](https://docs.percona.com/pg-tde/release-notes/beta2.html)
* The `pg_tde` extension itself is now a part of the Percona Server for PostgreSQL server package and a Docker image. If you installed the extension before, from its individual package, uninstall it first to avoid conflicts during the upgrade. See the [Minor Upgrade of Percona Distribution for PostgreSQL](minor-upgrade.md#before-you-start) for details.
For how to run `pg_tde` in Docker, check the [Enable encryption](docker.md#enable-encryption) section in the documentation.

For how to run `pg_tde` in Docker, check the [Enable encryption](docker.md#enable-encryption) section in the documentation.

* Percona Distribution for PostgreSQL now statically links `llvmjit.so` library for Red Hat Enterprise Linux 8 and 9 and compatible derivatives. This resolves the conflict between the LLVM version required by Percona Distribution for PostgreSQL and the one supplied with the operating system. This also enables you to use the LLVM modules supplied with the operating system for other software you require.
* Percona Monitoring and Management (PMM) 2.43.2 is now compatible with `pg_stat_monitor` 2.1.0 to monitor PostgreSQL 17.

Expand Down
2 changes: 1 addition & 1 deletion docs/solutions/dr-pgbackrest-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ log-level-console=info
log-level-file=debug

[prod_backup]
pg1-path=/var/lib/postgresql/14/main
pg1-path=/var/lib/postgresql/{{pgversion}}/main
```


Expand Down
64 changes: 64 additions & 0 deletions docs/solutions/etcd-info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# ETCD

`etcd` is one of the key components in high availability architecture, therefore, it's important to understand it.

`etcd` is a distributed key-value consensus store that helps applications store and manage cluster configuration data and perform distributed coordination of a PostgreSQL cluster.

`etcd` runs as a cluster of nodes that communicate with each other to maintain a consistent state. The primary node in the cluster is called the "leader", and the remaining nodes are the "followers".

## How `etcd` works

Each node in the cluster stores data in a structured format and keeps a copy of the same data to ensure redundancy and fault tolerance. When you write data to `etcd`, the change is sent to the leader node, which then replicates it to the other nodes in the cluster. This ensures that all nodes remain synchronized and maintain data consistency.

When a client wants to change data, it sends the request to the leader. The leader accepts the writes and proposes this change to the followers. The followers vote on the proposal. If a majority of followers agree (including the leader), the change is committed, ensuring consistency. The leader then confirms the change to the client.

This flow corresponds to the Raft consensus algorithm, based on which `etcd` works. Read morea bout it the [`ectd` Raft consensus](#etcd-raft-consensus) section.

## Leader election

An `etcd` cluster can have only one leader node at a time. The leader is responsible for receiving client requests, proposing changes, and ensuring they are replicated to the followers. When an `etcd` cluster starts, or if the current leader fails, the nodes hold an election to choose a new leader. Each node waits for a random amount of time before sending a vote request to other nodes, and the first node to get a majority of votes becomes the new leader. The cluster remains available as long as a majority of nodes (quorum) are still running.

### How many members to have in a cluster

The recommended approach is to deploy an odd-sized cluster (e.g., 3, 5, or 7 nodes). The odd number of nodes ensures that there is always a majority of nodes available to make decisions and keep the cluster running smoothly. This majority is crucial for maintaining consistency and availability, even if one node fails. For a cluster with `n` members, the majority is `(n/2)+1`.

To better illustrate this concept, take an example of clusters with 3 nodes and 4 nodes. In a 3-node cluster, if one node fails, the remaining 2 nodes still form a majority (2 out of 3), and the cluster can continue to operate. In a 4-node cluster, if one node fails, there are only 3 nodes left, which is not enough to form a majority (3 out of 4). The cluster stops functioning.

## `etcd` Raft consensus

The heart of `etcd`'s reliability is the Raft consensus algorithm. Raft ensures that all nodes in the cluster agree on the same data. This ensures a consistent view of the data, even if some nodes are unavailable or experiencing network issues.

An example of the Raft's role in `etcd` is the situation when there is no majority in the cluster. If a majority of nodes can't communicate (for example, due to network partitions), no new leader can be elected, and no new changes can be committed. This prevents the system from getting into an inconsistent state. The system waits for the network to heal and a majority to be re-established. This is crucial for data integrity.

You can also check [this resource :octicons-link-external-17:](https://thesecretlivesofdata.com/raft/) to learn more about Raft and understand it better.

## `etcd` logs and performance considerations

`etcd` keeps a detailed log of every change made to the data. These logs are essential for several reasons, including the ensurance of consistency, fault tolerance, leader elections, auditing, and others, maintaining a consistent state across nodes. For example, if a node fails, it can use the logs to catch up with the other nodes and restore its data. The logs also provide a history of all changes, which can be useful for debugging and security analysis if needed.

### Slow disk performance

`etcd` is very sensitive to disk I/O performance. Writing to the logs is a frequent operation and will be slow if the disk is slow. This can lead to timeouts, delaying consensus, instability, and even data loss. In extreme cases, slow disk performance can cause a leader to fail health checks, triggering unnecessary leader elections. Always use fast, reliable storage for `etcd`.

### Slow or high-latency networks

Communication between `etcd` nodes is critical. A slow or unreliable network can cause delays in replicating data, increasing the risk of stale reads. This can trigger premature timeouts leading to leader elections happening more frequently, and even delays in leader elections in some cases, impacting performance and stability. Also keep in mind that if nodes cannot reach each other in a timely manner, the cluster may lose quorum and become unavailable.

## etcd Locks

`etcd` provides a distributed locking mechanism, which helps applications coordinate actions across multiple nodes and access to shared resources preventing conflicts. Locks ensure that only one process can hold a resource at a time, avoiding race conditions and inconsistencies. Patroni is an example of an application that uses `etcd` locks for primary election control in the PostgreSQL cluster.

### Deployment considerations

Running `etcd` on separate hosts has the following benefits:

* Both PostgreSQL and `etcd` are highly dependant on I/O. And running them on the separate hosts improves performance.

* Higher resilience. If one or even two PostgreSQL node crash, the `etcd` cluster remains healthy and can trigger a new primary election.

* Scalability and better performance. You can scale the `etcd` cluster separately from PostgreSQL based on the load and thus achieve better performance.

Note that separate deployment increases the complexity of the infrastructure and requires additional effort on maintenance. Also, pay close attention to network configuration to eliminate the latency that might occur due to the communication between `etcd` and Patroni nodes over the network.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a separate dedicated host for etcd is not a viable option, you can use the same host machines used for Patroni and PostgreSQL. The majority of the deployments use such a setup to reduce the cost.


If a separate dedicated host for 1 is not a viable option, you can use the same host machines used for Patroni and PostgreSQL.

60 changes: 60 additions & 0 deletions docs/solutions/ha-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Architecture

In the [overview of high availability](high-availability.md), we discussed the required components to achieve high-availability.

Our recommended minimalistic approach to a highly-available deployment is to have a three-node PostgreSQL cluster with the cluster management and failover mechanisms, load balancer and a backup / restore solution.

The following diagram shows this architecture, including all additional components. If you are considering a simple and cost-effective setup, refer to the [Bare-minimum architecture](#bare-minimum-architecture) section.

![Architecture of the three-node, single primary PostgreSQL cluster](../_images/diagrams/ha-recommended.svg)

## Components

The components in this architecture are:

### Database layer

- PostgreSQL nodes bearing the user data.

- Patroni - an automatic failover system. Patroni requires and uses the Distributed Configuration Store to store the cluster configuration, health and status.

- watchdog - a mechanism that will reset the whole system when they do not get a keepalive heartbeat within a specified timeframe. This adds an additional layer of fail safe in case usual Patroni split-brain protection mechanisms fail.

### DCS layer

- etcd - a Distributed Configuration Store. It stores the state of the PostgreSQL cluster and handles the election of a new primary. The odd number of nodes (minimum three) is required to always have the majority to agree on updates to the cluster state.

### Load balancing layer

- HAProxy - the load balancer and the single point of entry to the cluster for client applications. Minimum two instances are required for redundancy.

- keepalived - a high-availability and failover solution for HAProxy. It provides a virtual IP (VIP) address for HAProxy and prevents its single point of failure by failing over the services to the operational instance

- (Optional) pgbouncer - a connection pooler for PostgreSQL. The aim of pgbouncer is to lower the performance impact of opening new connections to PostgreSQL.

### Services layer

- pgBackRest - the backup and restore solution for PostgreSQL. It should also be redundant to eliminate a single point of failure.

- (Optional) Percona Monitoring and Management (PMM) - the solution to monitor the health of your cluster

## Bare-minimum architecture

There may be constraints to use the [reference architecture with all additional components](#architecture), like the number of available servers or the cost for additional hardware. You can still achieve high-availability with the minimum two database nodes and three `etcd` instances. The following diagram shows this architecture:

![Bare-minimum architecture of the PostgreSQL cluster](../_images/diagrams/HA-basic.svg)

Using such architecture has the following limitations:

* This setup only protects against a one node failure, either a database or a etcd node. Losing more than one node results in the read-only database.
* The application must be able to connect to multiple database nodes and fail over to the new primary in the case of outage.
* The application must act as the load-balancer. It must be able to determine read/write and read-only requests and distribute them across the cluster.
- The `pbBackRest` component is optional as it doesn't server the purpose of high-availability. But it is highly-recommended for disaster recovery and is a must fo production environments. [Contact us](https://www.percona.com/about/contact) to discuss backup configurations and retention policies.

## Additional reading

[How components work together](ha-components.md){.md-button}

## Next steps

[Deployment - initial setup](ha-init-setup.md){.md-button}
Loading