-
Notifications
You must be signed in to change notification settings - Fork 33
PG-1127 Rewamped HA solution (17) #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 17
Are you sure you want to change the base?
Conversation
Reworked the overview
6846094
to
bcb0094
Compare
0529075
to
22f6672
Compare
22f6672
to
7f61997
Compare
7f61997
to
f86706a
Compare
reworked initial setup doc moved install step to every component's page Added requirement for redundancy for HAProxy
Added subtree, added info about etcd
1891b11
to
70b5e20
Compare
docs/solutions/ha-architecture.md
Outdated
* This setup only protects against a one node failure, either a database or a etcd node. Losing more than one node results in the read-only database. | ||
* The application must be able to connect to multiple database nodes and fail over to the new primary in the case of outage. | ||
* The application must act as the load-balancer. It must be able to determine read/write and read-only requests and distribute them across the cluster. | ||
* The `pbBackRest` component is optional but highly-recommended for disaster recovery. To eliminate a single point of failure, it should also be redundant but we're not discussing redundancy in this solution. [Contact us](https://www.percona.com/about/contact) to discuss it if this is the requirement for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pbBackRest
component is optional but highly-recommended for disaster recovery
to
The pbBackRest
-The database backup component is shown as optional, because it is not serving HA. But database backup is mandatory for production systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the sentence
"To eliminate a single point of failure, it should also be redundant but we're not discussing redundancy in this solution."
Because backup configurations and retention policies need to be discussed separately.
docs/solutions/ha-components.md
Outdated
|
||
Each PostgreSQL instance in the cluster maintains consistency with other members through streaming replication. Streaming replication is asynchronous by default, meaning that the primary does not wait for the secondaries to acknowledge the receipt of the data to consider the transaction complete. | ||
|
||
Each Patroni instance runs on top of and manages its own PostgreSQL instance. This means that Patroni starts and stops PostgreSQL and manages its configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Each Patroni instance runs on top of and manages its own PostgreSQL instance."
to
"Each Patroni instance manages its own PostgreSQL instance."
At the end:
So Patroni can be considered a sophisticated service manager for a PostgreSQL cluster.
docs/solutions/ha-components.md
Outdated
|
||
Each Patroni instance runs on top of and manages its own PostgreSQL instance. This means that Patroni starts and stops PostgreSQL and manages its configuration. | ||
|
||
Patroni is also responsible for creating and managing the PostgreSQL cluster. It performs the initial cluster initialization and monitors the cluster state. To do so, Patroni relies on and uses the Distributed Configuration Store (DCS), represented by `etcd` in our architecture. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Patroni is also responsible for creating and managing the PostgreSQL cluster. It performs the initial cluster initialization and monitors the cluster state."
Not always. So, prefer something like
"Patroni can initialise a new cluster and monitor the cluster, and take necessary automatic actions if needed"
docs/solutions/ha-components.md
Outdated
|
||
Though Patroni supports various Distributed Configuration Stores like ZooKeeper, etcd, Consul or Kubernetes, we recommend and support `etcd` as the most popular DCS due to its simplicity, consistency and reliability. | ||
|
||
Note that the PostgreSQL cluster and Patroni cluster are the same thing, and we will use these names interchangeably. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"PostgreSQL cluster" has a different meaning in the PG community.
So I would suggest
"We will be using PostgreSQL HA Cluster
and Patroni Cluster
interchangeably in this document"
docs/solutions/ha-components.md
Outdated
|
||
If the current primary node crashes, its lease on the lock in `etcd` expires. The lock is automatically released after its expiration time. `etcd` the starts a new election and a standby node attempts to acquire the lock to become the new primary. | ||
|
||
Patroni uses not only `etcd` locking mechanism. It also uses `etcd` to store the current state of the cluster, ensuring that all nodes are aware of the latest changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"all nodes are aware of the latest changes"
to
"all nodes are aware of the latest topology and status, Moreover etcd is used as dynamic configuration store where PostgreSQL parameters are centrally stored"
docs/solutions/ha-components.md
Outdated
|
||
## Load balancing layer | ||
|
||
This layer consists of HAProxy and keepalived. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This layer consists of HAProxy and keepalived."
to
"This layer consists of HAProxy as connection router"
docs/solutions/ha-components.md
Outdated
|
||
HAProxy also serves as the connection pooler. It manages a pool of reusable database connections to optimize performance and resource usage. Instead of creating and closing a new connection for every database request, HAProxy maintains a set of open connections that can be shared among multiple clients. | ||
|
||
HAProxy must be also redundant. You need minimum 2 HAProxy instances (one active and another one standby) to eliminate the single point of failure and be able to perform failover. This is where keepalived comes in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"HAProxy must be also redundant...."
to
"HAProxy must also be redundant. Each application node/pod can have its own HAProxy.
But if an application cannot have HAProxy in its own node/pod, that may introduce additional network hops and a failure point. If you are deploying HAProxy outside the application node/pod, there needs to be a minimum of 2 HAProxy nodes (One active and another standby) to avoid a single point of failure. We need to use a floating IP using Keepalived between all the HAProxy nodes
docs/solutions/ha-components.md
Outdated
|
||
HAProxy must be also redundant. You need minimum 2 HAProxy instances (one active and another one standby) to eliminate the single point of failure and be able to perform failover. This is where keepalived comes in. | ||
|
||
Keepalived is the failover tool for HAProxy. It provides the virtual IP address (VIP) for HAProxy and monitors its state. When the current active HAProxy node is down, it transfers the VIP to the remaining node and fails over the services there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Keepalived is the failover tool for HAProxy"
to
"Keepalived and the floating IP (VIP) it manages can act as the failover tool for HAProxy"
docs/solutions/ha-components.md
Outdated
|
||
Finally, the services layer is represented by `pgBackRest` and PMM. | ||
|
||
`pgBackRest` is deployed as the separate backup server and also as the agents on every database node. `pgBackRest` makes backups from the one of the secondary nodes and WAL archiving - from the primary. By communicating with its agents, `pgBackRest` determines the current primary PostgreSQL node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"pgBackRest
is deployed as the separate backup server and also as the agents on every database node. pgBackRest
makes backups from the one of the secondary nodes and WAL archiving - from the primary."
to
"pgBackRest" can manage a dedicated backup server OR backup to cloud.
pgBackRest" will be deployed in every PostgreSQL nodes. pgBackRest
can utilize standby nodes to offload the backup load from the primary, However WAL archival will be happening only from the Primary"
"By communicating with its agents, pgBackRest
determines the current primary PostgreSQL node."
to
"pgBackRest is capable of understanding the current topology of the cluster and leveraging nodes for taking backups most effectively without any manual reconfiguration at the event of a Switchover or Failover"
docs/solutions/ha-haproxy.md
Outdated
|
||
HAproxy is the load balancer and the single point of entry to your PostgreSQL cluster for client applications. A client application accesses the HAPoxy URL and sends its read/write requests there. Behind-the-scene, HAProxy routes write requests to the primary node and read requests - to the secondaries in a round-robin fashion so that no secondary instance is unnecessarily loaded. To make this happen, provide different ports in the HAProxy configuration file. In this deployment, writes are routed to port 5000 and reads - to port 5001. | ||
|
||
This way, a client application doesn't know what node in the underlying cluster is the current primary. HAProxy sends connections to a healthy node (as long as there is at least one healthy node available) and ensures that client application requests are never rejected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"HAProxy sends connections to a healthy node"
to
"HAProxy sends read-write connections tothe current primary node and read-only connections to the current standby nodes"
docs/solutions/ha-haproxy.md
Outdated
|
||
This way, a client application doesn't know what node in the underlying cluster is the current primary. HAProxy sends connections to a healthy node (as long as there is at least one healthy node available) and ensures that client application requests are never rejected. | ||
|
||
To eliminate a single point of failure for HAProxy, you need the failover tool for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the HAProxy is hosted on a separate node than the application node, Then need to have multiple HAProxy nodes and automatic failover between them.
docs/solutions/ha-measure.md
Outdated
@@ -0,0 +1,22 @@ | |||
# Measuring high availability | |||
|
|||
The need for high availability is determined by the business requirements, potential risks, and operational limitations (e.g. the more components you add to your infrastructure, the more complex and time-consuming it is to maintain). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The more components you add to your infrastructure, the more complex and time-consuming it is to maintain"
to
"The more components you add to your infrastructure, the more complex and time-consuming it is to maintain. Moreover, it may introduce more failure points. Simpler the better"
docs/solutions/ha-measure.md
Outdated
|
||
The level of high availability depends on the following: | ||
|
||
* how much downtime you can bear without negatively impacting your users and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A total of three points about High Availability.
- How frequently you may encounter an outage/downtime.
- How much downtime can you bear for each outage
- How much data loss can you tolerate when there is an outage?
* how much downtime you can bear without negatively impacting your users and | ||
* how much data loss you can tolerate during the system outage. | ||
|
||
The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available."
to
"The estimate of availability requirement is done by establishing a measurement time frame and dividing it by the time that the database needs to be available. The table in the following section will help you to decide on the availability requirement"
Regarding the second part, measuring availability is important to ensuring that the investment in the HA setup is paying off. The measurement needs to be based on actual data rather than the expected outcome. The data for availability measurement should be collected from Incident management and analysed periodically to ensure that you are able to meet the expectations. The table in the following section can be used for the measurement and validation of the HA
* how much data loss you can tolerate during the system outage. | ||
|
||
The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the point, how frequently you may encounter an outage or downtime is one of the most important measures of availability. The MTBF (Mean time between failures/Incidents) is to be measured from the Incident management data. This mainly depends on the quality of the Infrastructure. Not all infrastructures are good for database workload. Generally, one should expect at least a couple of years between two incidents on a stable infrastructure; 3-4 years is common.
|
||
The measurement of availability is done by establishing a measurement time frame and dividing it by the time that it was available. This ratio will rarely be one, which is equal to 100% availability. At Percona, we don’t consider a solution to be highly available if it is not at least 99% or two nines available. | ||
|
||
The following table shows the amount of downtime for each level of availability from two to five nines. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now comes the point of how fast we can recover from an outage and how much downtime we actually encounter. On a typical Patroni cluster, Patroni will be able to failover in 30 to 50 seconds.
Note : The application's capability to detect the failure and reconnect to a new Primary is not accounted for in Database Availability. Some of the applications might need a restart while others easily recover.
|
||
The need for high availability is determined by the business requirements, potential risks, and operational limitations. For example, the more components you add to your infrastructure, the more complex and time-consuming it is to maintain. Moreover, it may introduce extra failure points. The recommendation is to follow the principle "The simpler the better". | ||
|
||
The level of high availability depends on the following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The level of high availability depends on the following:"
to
"There are two parts to achieving High Availability.
The first part is deciding on the expected availability from the business point of view, planning and investing in a HA infrastructure.
The second part is measuring and ensuring that the expected availability is met, and that investment in HA infrastructure is paying off"
The need for high availability is determined by the business requirements, potential risks, and operational limitations. For example, the more components you add to your infrastructure, the more complex and time-consuming it is to maintain. Moreover, it may introduce extra failure points. The recommendation is to follow the principle "The simpler the better". | ||
|
||
The level of high availability depends on the following: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the first part, you need to start by assessing the business and environmental expectations about database availability, considering the following points. History/knowledge about a specific environment would be helpful.
No description provided.