PG-1127 Rewamped HA solution (17) #679

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

nastena1606 wants to merge 26 commits into 17 from PG-1127-HA-rewamp-17

Contributor

nastena1606 commented Nov 21, 2024

No description provided.

nastena1606 added 2 commits

October 29, 2024 10:56


          Extended description of architecture

bb9e170


          PG-1127 Rewamp HA solution

8f3a3e8

Reworked the overview

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

November 21, 2024 08:46

— with

Render View deployment


          Split setup into individual components pages

5673f82

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

November 25, 2024 12:58

— with

Render View deployment

nastena1606 commented

View reviewed changes

docs/solutions/ha-install-software.md Outdated Show resolved Hide resolved

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

November 29, 2024 12:14

— with

Render View deployment


          Reworked the intro

bcb0094

nastena1606 force-pushed the PG-1127-HA-rewamp-17 branch from 6846094 to bcb0094 Compare

December 6, 2024 14:33

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

December 6, 2024 14:33

— with

Render View deployment

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

December 11, 2024 13:28

— with

Render View deployment

nastena1606 force-pushed the PG-1127-HA-rewamp-17 branch from 0529075 to 22f6672 Compare

January 15, 2025 12:50

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

January 15, 2025 12:50

— with

Render View deployment

nastena1606 force-pushed the PG-1127-HA-rewamp-17 branch from 22f6672 to 7f61997 Compare

January 15, 2025 12:52

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

January 15, 2025 12:52

— with

Render View deployment


          Added diagrams to overview

f86706a

nastena1606 force-pushed the PG-1127-HA-rewamp-17 branch from 7f61997 to f86706a Compare

January 15, 2025 14:48

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

January 15, 2025 14:48

— with

Render View deployment


          Moved components interaction to a separate page

f9ab900

reworked initial setup doc
moved install step to every component's page
Added requirement for redundancy for HAProxy

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

January 21, 2025 17:16

— with

Render View deployment


          Updated how components work together

f5cde85

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

February 3, 2025 16:43

— with

Render View deployment


          Reworked Components part

cbf2a6d

Added subtree, added info about etcd

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

February 5, 2025 07:21

— with

Render View deployment


          Added Patroni description

0ca8732

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

February 11, 2025 10:30

— with

Render View deployment


          Markup polishing

626f125

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

February 13, 2025 15:08

— with

Render View deployment


          Patroni and pgBackRest config'

2f4b499

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

February 17, 2025 18:41

— with

Render View deployment


          Merge branch '17' into PG-1127-HA-rewamp-17

70b5e20

nastena1606 force-pushed the PG-1127-HA-rewamp-17 branch from 1891b11 to 70b5e20 Compare

June 30, 2025 06:31

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

June 30, 2025 06:31

— with

Render View deployment

jobinau reviewed

View reviewed changes

docs/solutions/ha-architecture.md Outdated Show resolved Hide resolved

jobinau reviewed

View reviewed changes

docs/solutions/ha-architecture.md Outdated Show resolved Hide resolved

jobinau reviewed

View reviewed changes

docs/solutions/ha-architecture.md Outdated

+              * This setup only protects against a one node failure, either a database or a etcd node. Losing more than one node results in the read-only database.
+              * The application must be able to connect to multiple database nodes and fail over to the new primary in the case of outage.
+              * The application must act as the load-balancer. It must be able to determine read/write and read-only requests and distribute them across the cluster.
+              * The `pbBackRest` component is optional but highly-recommended for disaster recovery. To eliminate a single point of failure, it should also be redundant but we're not discussing redundancy in this solution. [Contact us](https://www.percona.com/about/contact) to discuss it if this is the requirement for you.

Contributor

jobinau Jun 30, 2025 •

edited

Loading

The pbBackRest component is optional but highly-recommended for disaster recovery

to

The pbBackRest -The database backup component is shown as optional, because it is not serving HA. But database backup is mandatory for production systems

Contributor

jobinau Jun 30, 2025

We can remove the sentence
"To eliminate a single point of failure, it should also be redundant but we're not discussing redundancy in this solution."
Because backup configurations and retention policies need to be discussed separately.


          Updated after the review. P2

62a10ad

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		Each PostgreSQL instance in the cluster maintains consistency with other members through streaming replication. Streaming replication is asynchronous by default, meaning that the primary does not wait for the secondaries to acknowledge the receipt of the data to consider the transaction complete.

		Each Patroni instance runs on top of and manages its own PostgreSQL instance. This means that Patroni starts and stops PostgreSQL and manages its configuration.

Contributor

jobinau Jun 30, 2025 •

edited

Loading

"Each Patroni instance runs on top of and manages its own PostgreSQL instance."
to
"Each Patroni instance manages its own PostgreSQL instance."

At the end:
So Patroni can be considered a sophisticated service manager for a PostgreSQL cluster.


          Restoring the new navigation

c99d038

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

June 30, 2025 15:19

— with

Render View deployment

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		Each Patroni instance runs on top of and manages its own PostgreSQL instance. This means that Patroni starts and stops PostgreSQL and manages its configuration.

		Patroni is also responsible for creating and managing the PostgreSQL cluster. It performs the initial cluster initialization and monitors the cluster state. To do so, Patroni relies on and uses the Distributed Configuration Store (DCS), represented by `etcd` in our architecture.

Contributor

jobinau Jun 30, 2025

"Patroni is also responsible for creating and managing the PostgreSQL cluster. It performs the initial cluster initialization and monitors the cluster state."
Not always. So, prefer something like
"Patroni can initialise a new cluster and monitor the cluster, and take necessary automatic actions if needed"

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		Though Patroni supports various Distributed Configuration Stores like ZooKeeper, etcd, Consul or Kubernetes, we recommend and support `etcd` as the most popular DCS due to its simplicity, consistency and reliability.

		Note that the PostgreSQL cluster and Patroni cluster are the same thing, and we will use these names interchangeably.

Contributor

jobinau Jun 30, 2025

"PostgreSQL cluster" has a different meaning in the PG community.
So I would suggest
"We will be using PostgreSQL HA Cluster and Patroni Cluster interchangeably in this document"

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		If the current primary node crashes, its lease on the lock in `etcd` expires. The lock is automatically released after its expiration time. `etcd` the starts a new election and a standby node attempts to acquire the lock to become the new primary.

		Patroni uses not only `etcd` locking mechanism. It also uses `etcd` to store the current state of the cluster, ensuring that all nodes are aware of the latest changes.

Contributor

jobinau Jun 30, 2025

"all nodes are aware of the latest changes"
to
"all nodes are aware of the latest topology and status, Moreover etcd is used as dynamic configuration store where PostgreSQL parameters are centrally stored"

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		## Load balancing layer

		This layer consists of HAProxy and keepalived.

Contributor

jobinau Jun 30, 2025

"This layer consists of HAProxy and keepalived."
to
"This layer consists of HAProxy as connection router"

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated Show resolved Hide resolved

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		HAProxy also serves as the connection pooler. It manages a pool of reusable database connections to optimize performance and resource usage. Instead of creating and closing a new connection for every database request, HAProxy maintains a set of open connections that can be shared among multiple clients.

		HAProxy must be also redundant. You need minimum 2 HAProxy instances (one active and another one standby) to eliminate the single point of failure and be able to perform failover. This is where keepalived comes in.

Contributor

jobinau Jun 30, 2025

"HAProxy must be also redundant...."
to
"HAProxy must also be redundant. Each application node/pod can have its own HAProxy.
But if an application cannot have HAProxy in its own node/pod, that may introduce additional network hops and a failure point. If you are deploying HAProxy outside the application node/pod, there needs to be a minimum of 2 HAProxy nodes (One active and another standby) to avoid a single point of failure. We need to use a floating IP using Keepalived between all the HAProxy nodes

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		HAProxy must be also redundant. You need minimum 2 HAProxy instances (one active and another one standby) to eliminate the single point of failure and be able to perform failover. This is where keepalived comes in.

		Keepalived is the failover tool for HAProxy. It provides the virtual IP address (VIP) for HAProxy and monitors its state. When the current active HAProxy node is down, it transfers the VIP to the remaining node and fails over the services there.

Contributor

jobinau Jun 30, 2025

"Keepalived is the failover tool for HAProxy"
to
"Keepalived and the floating IP (VIP) it manages can act as the failover tool for HAProxy"

jobinau reviewed

View reviewed changes

docs/solutions/ha-components.md Outdated


		Finally, the services layer is represented by `pgBackRest` and PMM.

		`pgBackRest` is deployed as the separate backup server and also as the agents on every database node. `pgBackRest` makes backups from the one of the secondary nodes and WAL archiving - from the primary. By communicating with its agents, `pgBackRest` determines the current primary PostgreSQL node.

Contributor

jobinau Jun 30, 2025

"pgBackRest is deployed as the separate backup server and also as the agents on every database node. pgBackRest makes backups from the one of the secondary nodes and WAL archiving - from the primary."
to
"pgBackRest" can manage a dedicated backup server OR backup to cloud. pgBackRest" will be deployed in every PostgreSQL nodes. pgBackRest can utilize standby nodes to offload the backup load from the primary, However WAL archival will be happening only from the Primary"

"By communicating with its agents, pgBackRest determines the current primary PostgreSQL node."
to
"pgBackRest is capable of understanding the current topology of the cluster and leveraging nodes for taking backups most effectively without any manual reconfiguration at the event of a Switchover or Failover"

jobinau reviewed

View reviewed changes

docs/solutions/ha-haproxy.md Outdated Show resolved Hide resolved

jobinau reviewed

View reviewed changes

docs/solutions/ha-haproxy.md Outdated


		HAproxy is the load balancer and the single point of entry to your PostgreSQL cluster for client applications. A client application accesses the HAPoxy URL and sends its read/write requests there. Behind-the-scene, HAProxy routes write requests to the primary node and read requests - to the secondaries in a round-robin fashion so that no secondary instance is unnecessarily loaded. To make this happen, provide different ports in the HAProxy configuration file. In this deployment, writes are routed to port 5000 and reads - to port 5001.

		This way, a client application doesn't know what node in the underlying cluster is the current primary. HAProxy sends connections to a healthy node (as long as there is at least one healthy node available) and ensures that client application requests are never rejected.

Contributor

jobinau Jul 1, 2025

"HAProxy sends connections to a healthy node"
to
"HAProxy sends read-write connections tothe current primary node and read-only connections to the current standby nodes"

jobinau reviewed

View reviewed changes

docs/solutions/ha-haproxy.md Outdated


		This way, a client application doesn't know what node in the underlying cluster is the current primary. HAProxy sends connections to a healthy node (as long as there is at least one healthy node available) and ensures that client application requests are never rejected.

		To eliminate a single point of failure for HAProxy, you need the failover tool for it.

Contributor

jobinau Jul 1, 2025

If the HAProxy is hosted on a separate node than the application node, Then need to have multiple HAProxy nodes and automatic failover between them.

jobinau reviewed

View reviewed changes

docs/solutions/ha-measure.md Outdated

		@@ -0,0 +1,22 @@
		# Measuring high availability

		The need for high availability is determined by the business requirements, potential risks, and operational limitations (e.g. the more components you add to your infrastructure, the more complex and time-consuming it is to maintain).

Contributor

jobinau Jul 1, 2025

"The more components you add to your infrastructure, the more complex and time-consuming it is to maintain"
to
"The more components you add to your infrastructure, the more complex and time-consuming it is to maintain. Moreover, it may introduce more failure points. Simpler the better"

jobinau reviewed

View reviewed changes

docs/solutions/ha-measure.md Outdated


		The level of high availability depends on the following:

		* how much downtime you can bear without negatively impacting your users and

Contributor

jobinau Jul 1, 2025

A total of three points about High Availability.

How frequently you may encounter an outage/downtime.
How much downtime can you bear for each outage
How much data loss can you tolerate when there is an outage?

jobinau reviewed

View reviewed changes

docs/solutions/ha-measure.md

+              * how much downtime you can bear without negatively impacting your users and
+              * how much data loss you can tolerate during the system outage.
+              The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available.

Contributor

jobinau Jul 1, 2025 •

edited

Loading

"The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available."
to
"The estimate of availability requirement is done by establishing a measurement time frame and dividing it by the time that the database needs to be available. The table in the following section will help you to decide on the availability requirement"

Regarding the second part, measuring availability is important to ensuring that the investment in the HA setup is paying off. The measurement needs to be based on actual data rather than the expected outcome. The data for availability measurement should be collected from Incident management and analysed periodically to ensure that you are able to meet the expectations. The table in the following section can be used for the measurement and validation of the HA

jobinau reviewed

View reviewed changes

docs/solutions/ha-measure.md

		* how much data loss you can tolerate during the system outage.

		The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available.

Contributor

jobinau Jul 1, 2025 •

edited

Loading

Regarding the point, how frequently you may encounter an outage or downtime is one of the most important measures of availability. The MTBF (Mean time between failures/Incidents) is to be measured from the Incident management data. This mainly depends on the quality of the Infrastructure. Not all infrastructures are good for database workload. Generally, one should expect at least a couple of years between two incidents on a stable infrastructure; 3-4 years is common.

jobinau reviewed

View reviewed changes

docs/solutions/ha-measure.md


		The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available.

		The following table shows the amount of downtime for each level of availability from two to five nines.

Contributor

jobinau Jul 1, 2025

Now comes the point of how fast we can recover from an outage and how much downtime we actually encounter. On a typical Patroni cluster, Patroni will be able to failover in 30 to 50 seconds.
Note : The application's capability to detect the failure and reconnect to a new Primary is not accounted for in Database Availability. Some of the applications might need a restart while others easily recover.


          Updated after the review. P3

c385b6b

nastena1606 deployed to PG-1127-HA-rewamp-17 - postgresql-docs-preview-17 PR #679

July 2, 2025 04:46

— with

Render View deployment

jobinau reviewed

View reviewed changes

docs/solutions/ha-measure.md


		The need for high availability is determined by the business requirements, potential risks, and operational limitations. For example, the more components you add to your infrastructure, the more complex and time-consuming it is to maintain. Moreover, it may introduce extra failure points. The recommendation is to follow the principle "The simpler the better".

		The level of high availability depends on the following:

Contributor

jobinau Jul 2, 2025

"The level of high availability depends on the following:"
to
"There are two parts to achieving High Availability.
The first part is deciding on the expected availability from the business point of view, planning and investing in a HA infrastructure.
The second part is measuring and ensuring that the expected availability is met, and that investment in HA infrastructure is paying off"

jobinau reviewed

View reviewed changes

docs/solutions/ha-measure.md

		The need for high availability is determined by the business requirements, potential risks, and operational limitations. For example, the more components you add to your infrastructure, the more complex and time-consuming it is to maintain. Moreover, it may introduce extra failure points. The recommendation is to follow the principle "The simpler the better".

		The level of high availability depends on the following:

Contributor

jobinau Jul 2, 2025

Regarding the first part, you need to start by assessing the business and environmental expectations about database availability, considering the following points. History/knowledge about a specific environment would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet