Skip to content

Conversation

@leekihyunn
Copy link

@leekihyunn leekihyunn commented Dec 23, 2025

Summary

This PR implements AWS Aurora PostgreSQL auto-discovery functionality, bringing feature parity with the existing MySQL Aurora implementation. ProxySQL can now automatically detect Aurora PostgreSQL cluster topology, identify Writer/Reader instances, and handle failovers seamlessly.

Issue #5249

Motivation

Aurora PostgreSQL users currently need to manually configure and update server hostgroups when cluster topology changes. This implementation enables:

  • Automatic discovery of Aurora PostgreSQL cluster members via aurora_replica_status() function
  • Automatic Writer/Reader identification and hostgroup placement
  • Seamless failover handling with automatic role switching
  • Replication lag-based traffic routing

Why This Matters

As PostgreSQL adoption continues to grow, especially with AWS Aurora PostgreSQL, there's an increasing need for a robust connection pooling and query routing middleware. While PgBouncer has been the traditional choice for PostgreSQL connection pooling, it lacks advanced features like:

  • Automatic topology discovery
  • Intelligent read/write splitting based on cluster state
  • Automatic failover handling
  • Replication lag-aware routing

With this Aurora PostgreSQL auto-discovery feature, ProxySQL can serve as a more capable alternative to PgBouncer for Aurora PostgreSQL users, offering:

  • Zero-downtime failovers: Automatic detection and routing adjustment when Writer changes
  • Intelligent load balancing: Route read traffic only to healthy replicas with acceptable lag
  • Simplified operations: No manual intervention needed when cluster topology changes
  • Unified management: Single proxy solution for both MySQL and PostgreSQL workloads

I believe this feature will benefit many Aurora PostgreSQL users who need enterprise-grade connection management and query routing capabilities.

Changes Made

New Tables

  • pgsql_aws_aurora_hostgroups - Configuration table for Aurora PostgreSQL clusters
  • runtime_pgsql_aws_aurora_hostgroups - Runtime configuration table
  • pgsql_server_aws_aurora_log - Aurora health check logs
  • pgsql_server_aws_aurora_check_status - Check status tracking
  • pgsql_server_aws_aurora_failovers - Failover event logging

Configuration Parameters

All parameters mirror the MySQL Aurora implementation:

  • writer_hostgroup / reader_hostgroup - Hostgroup assignments
  • max_lag_ms - Maximum replication lag threshold for shunning
  • check_interval_ms - Health check interval
  • check_timeout_ms - Health check timeout
  • writer_is_also_reader - Place writer in reader hostgroup
  • new_reader_weight - Weight for newly discovered readers
  • add_lag_ms / min_lag_ms / lag_num_checks - Lag estimation parameters
  • aurora_port - Default port (5432)
  • domain_name - Aurora cluster domain suffix

Core Implementation

Monitoring (lib/PgSQL_Monitor.cpp)

  • PgSQL_monitor_AWS_Aurora_thread_HG() - Per-hostgroup monitoring thread
  • PgSQL_monitor_aws_aurora() - Main Aurora monitoring coordinator
  • estimate_lag() - Replication lag estimation using multiple check samples
  • evaluate_pgsql_aws_aurora_results() - Process health check results and trigger actions
  • populate_monitor_pgsql_server_aws_aurora_log() - Populate aurora log table for admin queries
  • populate_monitor_pgsql_server_aws_aurora_check_status() - Populate check status table for admin queries

HostGroups Management (lib/PgSQL_HostGroups_Manager.cpp)

  • PgSQL_AWS_Aurora_Info class - Aurora cluster configuration storage
  • update_aws_aurora_set_writer() - Set/update writer with auto-discovery
  • update_aws_aurora_set_reader() - Set/update reader with auto-discovery
  • aws_aurora_replication_lag_action() - Lag-based server shunning/unshunning
  • generate_pgsql_aws_aurora_hostgroups_table() - Generate runtime configuration
  • dump_table_pgsql() - Added support for pgsql_aws_aurora_hostgroups table dump

Admin (lib/ProxySQL_Admin.cpp)

  • save_pgsql_servers_runtime_to_database() - Added runtime aurora hostgroups dump
  • GenericRefreshStatistics() - Added triggers for aurora log/check_status table population

Technical Details

Aurora Status Query

PostgreSQL Aurora uses aurora_replica_status() function instead of MySQL's information_schema.replica_host_status:

SELECT server_id, session_id, replica_lag_in_msec,
       CASE WHEN session_id = 'MASTER_SESSION_ID' THEN true ELSE false END as is_writer
FROM aurora_replica_status()

Key Differences from MySQL Implementation

Aspect MySQL PostgreSQL
Status Query information_schema.replica_host_status aurora_replica_status()
Default Port 3306 5432
DB Library libmysqlclient libpq
Class Prefix MySQL_* / AWS_Aurora_* PgSQL_* / PgSQL_AWS_Aurora_*

Writer Identification

Both MySQL and PostgreSQL Aurora use the same method: session_id = 'MASTER_SESSION_ID'

Example Configuration

pgsql_aws_aurora_hostgroups = (
  {
    writer_hostgroup = 0
    reader_hostgroup = 1
    active = 1
    aurora_port = 5432
    domain_name = ".cluster-xxx.region.rds.amazonaws.com"
    max_lag_ms = 10000
    check_interval_ms = 2000
    check_timeout_ms = 1000
    writer_is_also_reader = 0
    new_reader_weight = 1
    add_lag_ms = 30
    min_lag_ms = 30
    lag_num_checks = 3
    comment = "Aurora PostgreSQL cluster"
  }
)

Files Changed

File Changes
include/ProxySQL_Admin_Tables_Definitions.h Aurora table schema definitions
include/proxysql_admin.h Added incoming_aurora_hostgroups field
lib/Admin_Bootstrap.cpp Register Aurora admin tables
lib/ProxySQL_Config.cpp Config file parsing for pgsql_aws_aurora_hostgroups
lib/ProxySQL_Admin.cpp Runtime loading, aurora table population triggers, runtime dump
include/PgSQL_HostGroups_Manager.h PgSQL_AWS_Aurora_Info class definition
lib/PgSQL_HostGroups_Manager.cpp Writer/Reader management, auto-discovery, table dump
include/PgSQL_Monitor.hpp Aurora monitoring classes, populate function declarations
lib/PgSQL_Monitor.cpp Monitoring threads, lag estimation, table population

Testing

Tested with AWS Aurora PostgreSQL cluster:

  • ✅ Automatic Writer detection via aurora_replica_status()
  • ✅ Automatic Reader detection and hostgroup placement
  • ✅ Failover handling (Writer role switch)
  • ✅ New server auto-discovery

@gemini-code-assist
Copy link

Summary of Changes

Hello @leekihyunn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances ProxySQL's capabilities by introducing comprehensive auto-discovery and management support for AWS Aurora PostgreSQL clusters. It eliminates the need for manual configuration and updates of server hostgroups, allowing ProxySQL to dynamically adapt to changes in Aurora PostgreSQL topology, such as failovers and new replica additions. This ensures high availability and optimal performance by automatically routing traffic to the correct instances and shunning those with excessive replication lag, mirroring the functionality already present for MySQL Aurora.

Highlights

  • Aurora PostgreSQL Auto-Discovery: ProxySQL now supports automatic discovery of AWS Aurora PostgreSQL cluster topologies, including Writer/Reader identification and seamless failover handling.
  • Feature Parity with MySQL Aurora: This implementation brings the same robust auto-discovery capabilities to PostgreSQL Aurora that were previously available for MySQL Aurora.
  • Dynamic Hostgroup Management: Automatically detects cluster members, assigns them to appropriate hostgroups (Writer/Reader), and handles role changes during failovers.
  • Replication Lag-Based Routing: Supports shunning/unshunning servers based on configurable replication lag thresholds, ensuring traffic is routed to healthy, up-to-date replicas.
  • New Configuration & Monitoring Tables: Introduces several new admin and runtime tables (pgsql_aws_aurora_hostgroups, runtime_pgsql_aws_aurora_hostgroups, pgsql_server_aws_aurora_log, pgsql_server_aws_aurora_check_status, pgsql_server_aws_aurora_failovers) for managing and monitoring Aurora PostgreSQL clusters.
  • Core Monitoring Logic: Implements dedicated monitoring threads (PgSQL_monitor_AWS_Aurora_thread_HG, PgSQL_monitor_aws_aurora) and logic (estimate_lag, evaluate_pgsql_aws_aurora_results) to continuously track Aurora PostgreSQL cluster health and topology.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@renecannao
Copy link
Contributor

Can one of the admins verify this patch?

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces AWS Aurora PostgreSQL auto-discovery support, achieving feature parity with the existing MySQL implementation. The changes are extensive, touching configuration, monitoring, and hostgroup management, and they closely follow the established patterns in the codebase, which is commendable for consistency.

My review has identified several critical security vulnerabilities related to the use of sprintf with inadequately sized buffers, which could lead to buffer overflows. I strongly recommend replacing these with snprintf and proper size calculation. Additionally, I've pointed out areas where modern C++ practices, such as using std::string instead of C-style char* for memory management, would improve code safety and maintainability. Other suggestions focus on improving code clarity and fixing minor logical issues.

Comment on lines +4890 to +4891
query = (char *)malloc(strlen(q) + strlen(_server_id) + strlen(domain_name) + 1024);
sprintf(query, q, _server_id, domain_name, aurora_port, _whid, _rhid);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This use of sprintf with a buffer size estimated with a magic number (1024) is a critical security vulnerability. A long _server_id or domain_name could cause a buffer overflow. You should calculate the required buffer size precisely and use snprintf. This unsafe pattern is repeated multiple times in update_aws_aurora_set_writer and update_aws_aurora_set_reader. Please fix all instances.

    int query_len = snprintf(nullptr, 0, q, _server_id, domain_name, aurora_port, _whid, _rhid);
    query = (char *)malloc(query_len + 1);
    snprintf(query, query_len + 1, q, _server_id, domain_name, aurora_port, _whid, _rhid);

Comment on lines +2521 to +2522
char* q2 = (char*)malloc(strlen(q1) + strlen(lut) + strlen(hse->server_id) + 32);
sprintf(q2, q1, wHG, hse->server_id, lut);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This is a critical buffer overflow vulnerability. The buffer size is estimated with a magic number (+ 32), which is not guaranteed to be sufficient for the length of lut and hse->server_id. This can lead to a buffer overflow. Please use snprintf with a correctly calculated buffer size to prevent this.

                        int q2_len = snprintf(nullptr, 0, q1, wHG, hse->server_id, lut);
                        char* q2 = (char*)malloc(q2_len + 1);
                        snprintf(q2, q2_len + 1, q1, wHG, hse->server_id, lut);

Comment on lines +1834 to +1835
char *query=(char *)malloc(strlen(q)+strlen(o)+strlen(p)+256);
sprintf(query,q, writer_hostgroup, reader_hostgroup, active, aurora_port, p, max_lag_ms, check_interval_ms, check_timeout_ms, writer_is_also_reader, new_reader_weight, add_lag_ms, min_lag_ms, lag_num_checks, o);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This is a critical buffer overflow vulnerability. The buffer for the SQL query is allocated with a fixed-size addition (+256), which may not be sufficient if the escaped comment (o) or domain_name (p) are very long. This can be exploited. Please use snprintf to safely construct the query string by first calculating the required size.

            int query_len = snprintf(nullptr, 0, q, writer_hostgroup, reader_hostgroup, active, aurora_port, p, max_lag_ms, check_interval_ms, check_timeout_ms, writer_is_also_reader, new_reader_weight, add_lag_ms, min_lag_ms, lag_num_checks, o);
            char *query=(char *)malloc(query_len + 1);
            snprintf(query, query_len + 1, q, writer_hostgroup, reader_hostgroup, active, aurora_port, p, max_lag_ms, check_interval_ms, check_timeout_ms, writer_is_also_reader, new_reader_weight, add_lag_ms, min_lag_ms, lag_num_checks, o);

Comment on lines +4777 to +4778
char *address = (char *)malloc(strlen(_server_id) + strlen(domain_name) + 1);
sprintf(address, "%s%s", _server_id, domain_name);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using sprintf with a manually calculated buffer size is a potential security risk. If the calculation is incorrect, it can lead to a buffer overflow. It is much safer to use snprintf, which prevents buffer overflows by respecting the provided buffer size.

    size_t address_len = strlen(_server_id) + strlen(domain_name) + 1;
    char *address = (char *)malloc(address_len);
    snprintf(address, address_len, "%s%s", _server_id, domain_name);

int check_interval_ms;
int check_timeout_ms;
bool active;
bool __active; // temporary flag for tracking during regeneration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Identifiers that contain a double underscore are reserved for the C++ implementation. Using them can lead to undefined behavior. Please rename __active to a non-reserved name like _active or active_.

    bool _active;  // temporary flag for tracking during regeneration

Comment on lines +410 to +411
char *domain_name;
char *comment;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For new classes like PgSQL_AWS_Aurora_Info, consider using std::string for members like domain_name and comment instead of char*. std::string handles its own memory management, which prevents memory leaks and makes the code safer and easier to maintain. This would eliminate the need for manual strdup and free calls.

    std::string domain_name;
    std::string comment;

Comment on lines +2428 to +2431
size_t stride = sizeof(pgsql_host_def_t);
memcpy(&tmp, arr + i * stride / sizeof(pgsql_host_def_t), sizeof(pgsql_host_def_t));
memcpy(arr + i * stride / sizeof(pgsql_host_def_t), arr + j * stride / sizeof(pgsql_host_def_t), sizeof(pgsql_host_def_t));
memcpy(arr + j * stride / sizeof(pgsql_host_def_t), &tmp, sizeof(pgsql_host_def_t));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The shuffling logic using memcpy and stride is overly complex. The expression i * stride / sizeof(pgsql_host_def_t) simplifies to just i, making the calculation confusing. Using std::swap or a simple temporary variable would be much clearer and more idiomatic C++.

            pgsql_host_def_t tmp = arr[i];
            arr[i] = arr[j];
            arr[j] = tmp;

char* is_writer_str = PQgetvalue(res, i, 3);

float replica_lag = replica_lag_str ? atof(replica_lag_str) : 0.0f;
bool is_writer = (is_writer_str && (strcmp(is_writer_str, "t") == 0 || strcmp(is_writer_str, "true") == 0 || strcmp(is_writer_str, "1") == 0));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The is_writer column is a boolean type from the aurora_replica_status() function. For PostgreSQL, PQgetvalue returns 't' for true and 'f' for false. The checks for "true" and "1" are unnecessary and make the code more fragile. It's better to rely on the documented behavior of PQgetvalue for boolean types.

                    bool is_writer = (is_writer_str && strcmp(is_writer_str, "t") == 0);

else {
for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
SQLite3_row* r = *it;
proxy_error("Incompatible entry in pgsql_aws_aurora_hostgroups will be ignored : ( %s , %s , %s , %s )\n", r->fields[0], r->fields[1], r->fields[2], r->fields[3]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This error message only prints the first 4 fields of an incompatible entry in pgsql_aws_aurora_hostgroups. The table has more columns, and this partial information may not be sufficient to identify the problematic row during debugging. Consider logging all fields or at least the primary key (writer_hostgroup) to make the log message more useful.

This commit implements Aurora PostgreSQL auto-discovery functionality,
mirroring the existing Aurora MySQL implementation. The feature enables
automatic detection and management of Aurora PostgreSQL cluster topology.

Key features:
- Auto-discovery using aurora_replica_status() function
- Writer detection via session_id = 'MASTER_SESSION_ID'
- Automatic failover handling with hostgroup reconfiguration
- Replication lag-based server shunning
- Dynamic server addition when new nodes join the cluster

New tables:
- pgsql_aws_aurora_hostgroups (configuration)
- runtime_pgsql_aws_aurora_hostgroups (runtime)

Configuration parameters (same as MySQL Aurora):
- writer_hostgroup, reader_hostgroup
- aurora_port (default: 5432)
- domain_name, max_lag_ms, check_interval_ms, check_timeout_ms
- writer_is_also_reader, new_reader_weight
- add_lag_ms, min_lag_ms, lag_num_checks

Files modified:
- include/ProxySQL_Admin_Tables_Definitions.h: Aurora table definitions
- include/proxysql_admin.h: incoming_aurora_hostgroups field
- lib/Admin_Bootstrap.cpp: Admin table registration
- lib/ProxySQL_Config.cpp: Config file parsing
- lib/ProxySQL_Admin.cpp: Runtime load logic
- include/PgSQL_HostGroups_Manager.h: PgSQL_AWS_Aurora_Info class
- lib/PgSQL_HostGroups_Manager.cpp: Hostgroup management implementation
- include/PgSQL_Monitor.hpp: Monitoring class definitions
- lib/PgSQL_Monitor.cpp: Monitoring thread implementation
@leekihyunn leekihyunn force-pushed the feature/aurora-postgresql-autodiscovery branch from aecb7cb to d1114d6 Compare December 23, 2025 09:21
@leekihyunn
Copy link
Author

I've reviewed the SonarCloud warnings and code review feedback. These patterns intentionally follow the existing MySQL implementation for consistency:

Issue MySQL Reference
sprintf usage MySQL_HostGroups_Manager.cpp
strlen safety MySQL_Monitor.cpp
rand() MySQL_Monitor.cpp:6236
__active naming MySQL_HostGroups_Manager.h, MySQL_Authentication.hpp, etc.
const_cast / #define MySQL_Monitor.hpp

…tgreSQL

- Add populate_monitor_pgsql_server_aws_aurora_log() to display aurora log entries
- Add populate_monitor_pgsql_server_aws_aurora_check_status() to display check status
- Add pgsql_aws_aurora_hostgroups support in dump_table_pgsql()
- Add runtime_pgsql_aws_aurora_hostgroups dump in save_pgsql_servers_runtime_to_database()
- Remove verbose error logging to match MySQL Aurora behavior pattern
@leekihyunn
Copy link
Author

Added missing monitoring table population and runtime table dump functionality:

Changes:

  • populate_monitor_pgsql_server_aws_aurora_log() - Populates aurora monitoring log table
  • populate_monitor_pgsql_server_aws_aurora_check_status() - Populates check status table
  • dump_table_pgsql() - Added pgsql_aws_aurora_hostgroups support
  • save_pgsql_servers_runtime_to_database() - Added runtime aurora hostgroups dump
  • Removed verbose error logging to match MySQL Aurora implementation pattern

These additions complete the admin table query functionality for Aurora PostgreSQL monitoring.

@renecannao
Copy link
Contributor

add to whitelist

@renecannao
Copy link
Contributor

Hi @leekihyunn .
Thank you for the PR !
Assigning it to @rahim-kanji .

The preview message "add to whitelist" is to trigger CI runs .

Please note that for Aurora MySQL we have a complex simulator (you may find it searching for TEST_AURORA) , and the implementation of the Aurora PostgreSQL auto-discovery may be dependent from the simulator itself.

@leekihyunn
Copy link
Author

Hi @renecannao and @rahim-kanji,

Thank you for reviewing this PR!

If there's any additional work required, please let me know and I'll address it promptly.

- Add PgSQL_Monitor_Connection_Pool class (equivalent to MySQL_Monitor_Connection_Pool)
- Add MonPgSrvC class for per-server connection management
- Implement get_connection/put_connection for connection reuse
- Add purge_some_connections to limit connections per server (max 4)
- Add purge_all_connections for cleanup on shutdown
- Wrap verbose logging with #ifdef TEST_AURORA guards
- Add print_pgsql_aws_aurora_status_entry for TEST_AURORA debugging
- Add action counting (action_yes/no, enabling/disabling) in evaluate function
…ping

Auto-discovered Aurora PostgreSQL servers were not being added to the
ping monitoring target list, causing deleted/failed instances to remain
in ONLINE status instead of being properly shunned.

Root cause:
- update_aws_aurora_set_writer() and update_aws_aurora_set_reader()
  were calling generate_pgsql_servers_table() to sync MyHostGroups
  to the internal pgsql_servers table
- But update_table_pgsql_servers_for_monitor() was not called after,
  so the monitor's server list (pgsql_servers_to_monitor) was never
  updated with the new auto-discovered servers
- MySQL implementation correctly calls update_table_mysql_servers_for_monitor()
  in these same locations

Fix:
Add update_table_pgsql_servers_for_monitor(false) calls in 4 locations:
- update_aws_aurora_set_writer(): commit path (line 4979)
- update_aws_aurora_set_writer(): auto-discovery path (line 5021)
- update_aws_aurora_set_reader(): commit path (line 5107)
- update_aws_aurora_set_reader(): auto-discovery path (line 5137)

This ensures auto-discovered servers are properly ping monitored
and will be shunned when they become unreachable.
MySQL does not log connection failures as errors (only SSL errors).
Change PostgreSQL monitor to follow the same pattern by using
proxy_debug instead of proxy_error for connection failures.

This prevents excessive error logs (every 2 seconds) when a server
is unreachable (e.g., deleted Aurora instance).

Note: MySQL has DNS caching (DNS_Cache class, dns_lookup(), DNS resolver
thread running every 60 seconds), while PostgreSQL does direct DNS lookup
on every connection attempt. This causes PostgreSQL to log errors every
2 seconds (ping interval) vs MySQL's ~60 seconds.
This commit ensures Aurora PostgreSQL monitoring behaves identically
to Aurora MySQL monitoring by adding missing functionality:

1. Add runtime variable refresh in Aurora monitoring threads
   - HG thread (PgSQL_monitor_AWS_Aurora_thread_HG): Now checks global
     variable version on each loop iteration and calls refresh_variables()
     when changes are detected, allowing runtime configuration changes
     to take effect immediately without restarting the thread
   - Main Aurora thread (PgSQL_monitor_aws_aurora): Added PgSQL_Thread
     object initialization and variable refresh loop

2. Add monitor_enabled check to Aurora thread loops
   - Both HG thread and main Aurora thread now check
     pgsql_thread___monitor_enabled in their while conditions
   - Setting pgsql-monitor_enabled=false now properly stops Aurora
     monitoring threads (previously they would continue running)

3. Add checksum comparison to update_aws_aurora_set_writer()
   - Before committing writer changes, compare checksums of current
     and incoming server lists
   - Only commit if there are actual changes, avoiding unnecessary
     table regeneration and log noise
   - Add verbose skip logging when hostgroup_manager_verbose > 1

These changes match the behavior of the MySQL Aurora implementation in:
- MySQL_Monitor.cpp: monitor_AWS_Aurora_thread_HG()
- MySQL_Monitor.cpp: MySQL_Monitor::monitor_aws_aurora()
- MySQL_HostGroups_Manager.cpp: update_aws_aurora_set_writer()

Benefits:
- Runtime variable changes (e.g., hostgroup_manager_verbose) are now
  reflected immediately in Aurora monitoring threads
- pgsql-monitor_enabled=false properly stops Aurora monitoring
- Reduced unnecessary commits and log output when no actual changes occur
- Full consistency with MySQL Aurora monitoring behavior
Changes:
1. Remove noisy "Monitor connect failed" logging on every connection failure
   - Errors are recorded in pgsql_server_connect_log table
   - Shunning logic will log when max_failures is reached (matching MySQL behavior)

2. Add Aurora health check error logging (matching MySQL behavior)
   - Connection failure: "Error on AWS Aurora PostgreSQL check for <host>:<port>..."
   - Query failure: "Error on AWS Aurora PostgreSQL check for <host>:<port>... Query failed"
   - Previous incorrect comment stated "Not logging to match MySQL" but MySQL does log these errors
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 7, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
45 Security Hotspots
E Reliability Rating on New Code (required ≥ A)
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

@leekihyunn
Copy link
Author

Aurora PostgreSQL Monitoring - Align with MySQL behavior (5 commits)

These fixes were discovered while comparing logs between Aurora PostgreSQL and Aurora MySQL environments during various tests including instance addition, deletion, and other scenarios.

1. Add connection pool (eb4e9025)

  • Add PgSQL_Monitor_Connection_Pool class (equivalent to MySQL's MySQL_Monitor_Connection_Pool)
  • Improve performance by reusing connections in Aurora monitoring

2. Fix auto-discovered servers not being ping monitored (497bf897)

  • Issue: Auto-discovered servers were not added to ping monitoring target list
  • Root cause: Missing update_table_pgsql_servers_for_monitor() calls in update_aws_aurora_set_writer/reader()
  • Fix: Add the function call in 4 locations

3. Change monitor connect error log level (b6d582f1)

  • Change connection failure log from proxy_error to proxy_debug
  • Prevent excessive error logs (was logging every 2 seconds), matching MySQL behavior

4. Align Aurora monitoring thread behavior (6141807a)

  • Add runtime variable refresh logic (config changes take effect immediately)
  • Properly stop Aurora threads when monitor_enabled=false
  • Add checksum comparison to avoid unnecessary commits on writer changes

5. Align Aurora monitor logging (1e5fdbdc)

  • Remove per-failure logging (only log when shunning occurs)
  • Add Aurora health check error logging (matching MySQL behavior)

Result: Aurora PostgreSQL monitoring now behaves identically to MySQL implementation

@mirostauder
Copy link
Collaborator

retest this please

3 similar comments
@mirostauder
Copy link
Collaborator

retest this please

@mirostauder
Copy link
Collaborator

retest this please

@mirostauder
Copy link
Collaborator

retest this please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants