Add AWS Aurora PostgreSQL Auto-Discovery Support #5267

leekihyunn · 2025-12-23T05:57:09Z

Summary

This PR implements AWS Aurora PostgreSQL auto-discovery functionality, bringing feature parity with the existing MySQL Aurora implementation. ProxySQL can now automatically detect Aurora PostgreSQL cluster topology, identify Writer/Reader instances, and handle failovers seamlessly.

Issue #5249

Motivation

Aurora PostgreSQL users currently need to manually configure and update server hostgroups when cluster topology changes. This implementation enables:

Automatic discovery of Aurora PostgreSQL cluster members via aurora_replica_status() function
Automatic Writer/Reader identification and hostgroup placement
Seamless failover handling with automatic role switching
Replication lag-based traffic routing

Why This Matters

As PostgreSQL adoption continues to grow, especially with AWS Aurora PostgreSQL, there's an increasing need for a robust connection pooling and query routing middleware. While PgBouncer has been the traditional choice for PostgreSQL connection pooling, it lacks advanced features like:

Automatic topology discovery
Intelligent read/write splitting based on cluster state
Automatic failover handling
Replication lag-aware routing

With this Aurora PostgreSQL auto-discovery feature, ProxySQL can serve as a more capable alternative to PgBouncer for Aurora PostgreSQL users, offering:

Zero-downtime failovers: Automatic detection and routing adjustment when Writer changes
Intelligent load balancing: Route read traffic only to healthy replicas with acceptable lag
Simplified operations: No manual intervention needed when cluster topology changes
Unified management: Single proxy solution for both MySQL and PostgreSQL workloads

I believe this feature will benefit many Aurora PostgreSQL users who need enterprise-grade connection management and query routing capabilities.

Changes Made

New Tables

pgsql_aws_aurora_hostgroups - Configuration table for Aurora PostgreSQL clusters
runtime_pgsql_aws_aurora_hostgroups - Runtime configuration table
pgsql_server_aws_aurora_log - Aurora health check logs
pgsql_server_aws_aurora_check_status - Check status tracking
pgsql_server_aws_aurora_failovers - Failover event logging

Configuration Parameters

All parameters mirror the MySQL Aurora implementation:

writer_hostgroup / reader_hostgroup - Hostgroup assignments
max_lag_ms - Maximum replication lag threshold for shunning
check_interval_ms - Health check interval
check_timeout_ms - Health check timeout
writer_is_also_reader - Place writer in reader hostgroup
new_reader_weight - Weight for newly discovered readers
add_lag_ms / min_lag_ms / lag_num_checks - Lag estimation parameters
aurora_port - Default port (5432)
domain_name - Aurora cluster domain suffix

Core Implementation

Monitoring (lib/PgSQL_Monitor.cpp)

PgSQL_monitor_AWS_Aurora_thread_HG() - Per-hostgroup monitoring thread
PgSQL_monitor_aws_aurora() - Main Aurora monitoring coordinator
estimate_lag() - Replication lag estimation using multiple check samples
evaluate_pgsql_aws_aurora_results() - Process health check results and trigger actions
populate_monitor_pgsql_server_aws_aurora_log() - Populate aurora log table for admin queries
populate_monitor_pgsql_server_aws_aurora_check_status() - Populate check status table for admin queries

HostGroups Management (lib/PgSQL_HostGroups_Manager.cpp)

PgSQL_AWS_Aurora_Info class - Aurora cluster configuration storage
update_aws_aurora_set_writer() - Set/update writer with auto-discovery
update_aws_aurora_set_reader() - Set/update reader with auto-discovery
aws_aurora_replication_lag_action() - Lag-based server shunning/unshunning
generate_pgsql_aws_aurora_hostgroups_table() - Generate runtime configuration
dump_table_pgsql() - Added support for pgsql_aws_aurora_hostgroups table dump

Admin (lib/ProxySQL_Admin.cpp)

save_pgsql_servers_runtime_to_database() - Added runtime aurora hostgroups dump
GenericRefreshStatistics() - Added triggers for aurora log/check_status table population

Technical Details

Aurora Status Query

PostgreSQL Aurora uses aurora_replica_status() function instead of MySQL's information_schema.replica_host_status:

SELECT server_id, session_id, replica_lag_in_msec,
       CASE WHEN session_id = 'MASTER_SESSION_ID' THEN true ELSE false END as is_writer
FROM aurora_replica_status()

Key Differences from MySQL Implementation

Aspect	MySQL	PostgreSQL
Status Query	`information_schema.replica_host_status`	`aurora_replica_status()`
Default Port	3306	5432
DB Library	libmysqlclient	libpq
Class Prefix	`MySQL_` / `AWS_Aurora_`	`PgSQL_` / `PgSQL_AWS_Aurora_`

Writer Identification

Both MySQL and PostgreSQL Aurora use the same method: session_id = 'MASTER_SESSION_ID'

Example Configuration

pgsql_aws_aurora_hostgroups = (
  {
    writer_hostgroup = 0
    reader_hostgroup = 1
    active = 1
    aurora_port = 5432
    domain_name = ".cluster-xxx.region.rds.amazonaws.com"
    max_lag_ms = 10000
    check_interval_ms = 2000
    check_timeout_ms = 1000
    writer_is_also_reader = 0
    new_reader_weight = 1
    add_lag_ms = 30
    min_lag_ms = 30
    lag_num_checks = 3
    comment = "Aurora PostgreSQL cluster"
  }
)

Files Changed

File	Changes
`include/ProxySQL_Admin_Tables_Definitions.h`	Aurora table schema definitions
`include/proxysql_admin.h`	Added `incoming_aurora_hostgroups` field
`lib/Admin_Bootstrap.cpp`	Register Aurora admin tables
`lib/ProxySQL_Config.cpp`	Config file parsing for `pgsql_aws_aurora_hostgroups`
`lib/ProxySQL_Admin.cpp`	Runtime loading, aurora table population triggers, runtime dump
`include/PgSQL_HostGroups_Manager.h`	`PgSQL_AWS_Aurora_Info` class definition
`lib/PgSQL_HostGroups_Manager.cpp`	Writer/Reader management, auto-discovery, table dump
`include/PgSQL_Monitor.hpp`	Aurora monitoring classes, populate function declarations
`lib/PgSQL_Monitor.cpp`	Monitoring threads, lag estimation, table population

Testing

Tested with AWS Aurora PostgreSQL cluster:

✅ Automatic Writer detection via aurora_replica_status()
✅ Automatic Reader detection and hostgroup placement
✅ Failover handling (Writer role switch)
✅ New server auto-discovery

gemini-code-assist · 2025-12-23T05:57:45Z

Summary of Changes

Hello @leekihyunn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances ProxySQL's capabilities by introducing comprehensive auto-discovery and management support for AWS Aurora PostgreSQL clusters. It eliminates the need for manual configuration and updates of server hostgroups, allowing ProxySQL to dynamically adapt to changes in Aurora PostgreSQL topology, such as failovers and new replica additions. This ensures high availability and optimal performance by automatically routing traffic to the correct instances and shunning those with excessive replication lag, mirroring the functionality already present for MySQL Aurora.

Highlights

Aurora PostgreSQL Auto-Discovery: ProxySQL now supports automatic discovery of AWS Aurora PostgreSQL cluster topologies, including Writer/Reader identification and seamless failover handling.
Feature Parity with MySQL Aurora: This implementation brings the same robust auto-discovery capabilities to PostgreSQL Aurora that were previously available for MySQL Aurora.
Dynamic Hostgroup Management: Automatically detects cluster members, assigns them to appropriate hostgroups (Writer/Reader), and handles role changes during failovers.
Replication Lag-Based Routing: Supports shunning/unshunning servers based on configurable replication lag thresholds, ensuring traffic is routed to healthy, up-to-date replicas.
New Configuration & Monitoring Tables: Introduces several new admin and runtime tables (pgsql_aws_aurora_hostgroups, runtime_pgsql_aws_aurora_hostgroups, pgsql_server_aws_aurora_log, pgsql_server_aws_aurora_check_status, pgsql_server_aws_aurora_failovers) for managing and monitoring Aurora PostgreSQL clusters.
Core Monitoring Logic: Implements dedicated monitoring threads (PgSQL_monitor_AWS_Aurora_thread_HG, PgSQL_monitor_aws_aurora) and logic (estimate_lag, evaluate_pgsql_aws_aurora_results) to continuously track Aurora PostgreSQL cluster health and topology.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

renecannao · 2025-12-23T06:01:06Z

Can one of the admins verify this patch?

gemini-code-assist

Code Review

This pull request introduces AWS Aurora PostgreSQL auto-discovery support, achieving feature parity with the existing MySQL implementation. The changes are extensive, touching configuration, monitoring, and hostgroup management, and they closely follow the established patterns in the codebase, which is commendable for consistency.

My review has identified several critical security vulnerabilities related to the use of sprintf with inadequately sized buffers, which could lead to buffer overflows. I strongly recommend replacing these with snprintf and proper size calculation. Additionally, I've pointed out areas where modern C++ practices, such as using std::string instead of C-style char* for memory management, would improve code safety and maintainability. Other suggestions focus on improving code clarity and fixing minor logical issues.

gemini-code-assist · 2025-12-23T06:03:09Z

lib/PgSQL_HostGroups_Manager.cpp

+	query = (char *)malloc(strlen(q) + strlen(_server_id) + strlen(domain_name) + 1024);
+	sprintf(query, q, _server_id, domain_name, aurora_port, _whid, _rhid);


This use of sprintf with a buffer size estimated with a magic number (1024) is a critical security vulnerability. A long _server_id or domain_name could cause a buffer overflow. You should calculate the required buffer size precisely and use snprintf. This unsafe pattern is repeated multiple times in update_aws_aurora_set_writer and update_aws_aurora_set_reader. Please fix all instances.

int query_len = snprintf(nullptr, 0, q, _server_id, domain_name, aurora_port, _whid, _rhid); query = (char *)malloc(query_len + 1); snprintf(query, query_len + 1, q, _server_id, domain_name, aurora_port, _whid, _rhid);

gemini-code-assist · 2025-12-23T06:03:10Z

lib/PgSQL_Monitor.cpp

+						char* q2 = (char*)malloc(strlen(q1) + strlen(lut) + strlen(hse->server_id) + 32);
+						sprintf(q2, q1, wHG, hse->server_id, lut);


This is a critical buffer overflow vulnerability. The buffer size is estimated with a magic number (+ 32), which is not guaranteed to be sufficient for the length of lut and hse->server_id. This can lead to a buffer overflow. Please use snprintf with a correctly calculated buffer size to prevent this.

int q2_len = snprintf(nullptr, 0, q1, wHG, hse->server_id, lut); char* q2 = (char*)malloc(q2_len + 1); snprintf(q2, q2_len + 1, q1, wHG, hse->server_id, lut);

gemini-code-assist · 2025-12-23T06:03:10Z

lib/ProxySQL_Config.cpp

+			char *query=(char *)malloc(strlen(q)+strlen(o)+strlen(p)+256);
+			sprintf(query,q, writer_hostgroup, reader_hostgroup, active, aurora_port, p, max_lag_ms, check_interval_ms, check_timeout_ms, writer_is_also_reader, new_reader_weight, add_lag_ms, min_lag_ms, lag_num_checks, o);


This is a critical buffer overflow vulnerability. The buffer for the SQL query is allocated with a fixed-size addition (+256), which may not be sufficient if the escaped comment (o) or domain_name (p) are very long. This can be exploited. Please use snprintf to safely construct the query string by first calculating the required size.

int query_len = snprintf(nullptr, 0, q, writer_hostgroup, reader_hostgroup, active, aurora_port, p, max_lag_ms, check_interval_ms, check_timeout_ms, writer_is_also_reader, new_reader_weight, add_lag_ms, min_lag_ms, lag_num_checks, o); char *query=(char *)malloc(query_len + 1); snprintf(query, query_len + 1, q, writer_hostgroup, reader_hostgroup, active, aurora_port, p, max_lag_ms, check_interval_ms, check_timeout_ms, writer_is_also_reader, new_reader_weight, add_lag_ms, min_lag_ms, lag_num_checks, o);

gemini-code-assist · 2025-12-23T06:03:10Z

lib/PgSQL_HostGroups_Manager.cpp

+	char *address = (char *)malloc(strlen(_server_id) + strlen(domain_name) + 1);
+	sprintf(address, "%s%s", _server_id, domain_name);


Using sprintf with a manually calculated buffer size is a potential security risk. If the calculation is incorrect, it can lead to a buffer overflow. It is much safer to use snprintf, which prevents buffer overflows by respecting the provided buffer size.

size_t address_len = strlen(_server_id) + strlen(domain_name) + 1; char *address = (char *)malloc(address_len); snprintf(address, address_len, "%s%s", _server_id, domain_name);

gemini-code-assist · 2025-12-23T06:03:10Z

include/PgSQL_HostGroups_Manager.h

+	int check_interval_ms;
+	int check_timeout_ms;
+	bool active;
+	bool __active;  // temporary flag for tracking during regeneration


Identifiers that contain a double underscore are reserved for the C++ implementation. Using them can lead to undefined behavior. Please rename __active to a non-reserved name like _active or active_.

bool _active; // temporary flag for tracking during regeneration

gemini-code-assist · 2025-12-23T06:03:10Z

include/PgSQL_HostGroups_Manager.h

+	char *domain_name;
+	char *comment;


For new classes like PgSQL_AWS_Aurora_Info, consider using std::string for members like domain_name and comment instead of char*. std::string handles its own memory management, which prevents memory leaks and makes the code safer and easier to maintain. This would eliminate the need for manual strdup and free calls.

std::string domain_name; std::string comment;

gemini-code-assist · 2025-12-23T06:03:10Z

lib/PgSQL_Monitor.cpp

+			size_t stride = sizeof(pgsql_host_def_t);
+			memcpy(&tmp, arr + i * stride / sizeof(pgsql_host_def_t), sizeof(pgsql_host_def_t));
+			memcpy(arr + i * stride / sizeof(pgsql_host_def_t), arr + j * stride / sizeof(pgsql_host_def_t), sizeof(pgsql_host_def_t));
+			memcpy(arr + j * stride / sizeof(pgsql_host_def_t), &tmp, sizeof(pgsql_host_def_t));


The shuffling logic using memcpy and stride is overly complex. The expression i * stride / sizeof(pgsql_host_def_t) simplifies to just i, making the calculation confusing. Using std::swap or a simple temporary variable would be much clearer and more idiomatic C++.

pgsql_host_def_t tmp = arr[i]; arr[i] = arr[j]; arr[j] = tmp;

gemini-code-assist · 2025-12-23T06:03:10Z

lib/PgSQL_Monitor.cpp

+					char* is_writer_str = PQgetvalue(res, i, 3);
+
+					float replica_lag = replica_lag_str ? atof(replica_lag_str) : 0.0f;
+					bool is_writer = (is_writer_str && (strcmp(is_writer_str, "t") == 0 || strcmp(is_writer_str, "true") == 0 || strcmp(is_writer_str, "1") == 0));


The is_writer column is a boolean type from the aurora_replica_status() function. For PostgreSQL, PQgetvalue returns 't' for true and 'f' for false. The checks for "true" and "1" are unnecessary and make the code more fragile. It's better to rely on the documented behavior of PQgetvalue for boolean types.

bool is_writer = (is_writer_str && strcmp(is_writer_str, "t") == 0);

gemini-code-assist · 2025-12-23T06:03:10Z

lib/ProxySQL_Admin.cpp

+	else {
+		for (std::vector<SQLite3_row*>::iterator it = resultset->rows.begin(); it != resultset->rows.end(); ++it) {
+			SQLite3_row* r = *it;
+			proxy_error("Incompatible entry in pgsql_aws_aurora_hostgroups will be ignored : ( %s , %s , %s , %s )\n", r->fields[0], r->fields[1], r->fields[2], r->fields[3]);


This error message only prints the first 4 fields of an incompatible entry in pgsql_aws_aurora_hostgroups. The table has more columns, and this partial information may not be sufficient to identify the problematic row during debugging. Consider logging all fields or at least the primary key (writer_hostgroup) to make the log message more useful.

This commit implements Aurora PostgreSQL auto-discovery functionality, mirroring the existing Aurora MySQL implementation. The feature enables automatic detection and management of Aurora PostgreSQL cluster topology. Key features: - Auto-discovery using aurora_replica_status() function - Writer detection via session_id = 'MASTER_SESSION_ID' - Automatic failover handling with hostgroup reconfiguration - Replication lag-based server shunning - Dynamic server addition when new nodes join the cluster New tables: - pgsql_aws_aurora_hostgroups (configuration) - runtime_pgsql_aws_aurora_hostgroups (runtime) Configuration parameters (same as MySQL Aurora): - writer_hostgroup, reader_hostgroup - aurora_port (default: 5432) - domain_name, max_lag_ms, check_interval_ms, check_timeout_ms - writer_is_also_reader, new_reader_weight - add_lag_ms, min_lag_ms, lag_num_checks Files modified: - include/ProxySQL_Admin_Tables_Definitions.h: Aurora table definitions - include/proxysql_admin.h: incoming_aurora_hostgroups field - lib/Admin_Bootstrap.cpp: Admin table registration - lib/ProxySQL_Config.cpp: Config file parsing - lib/ProxySQL_Admin.cpp: Runtime load logic - include/PgSQL_HostGroups_Manager.h: PgSQL_AWS_Aurora_Info class - lib/PgSQL_HostGroups_Manager.cpp: Hostgroup management implementation - include/PgSQL_Monitor.hpp: Monitoring class definitions - lib/PgSQL_Monitor.cpp: Monitoring thread implementation

leekihyunn · 2025-12-24T01:44:46Z

I've reviewed the SonarCloud warnings and code review feedback. These patterns intentionally follow the existing MySQL implementation for consistency:

Issue	MySQL Reference
`sprintf` usage	`MySQL_HostGroups_Manager.cpp`
`strlen` safety	`MySQL_Monitor.cpp`
`rand()`	`MySQL_Monitor.cpp:6236`
`__active` naming	`MySQL_HostGroups_Manager.h`, `MySQL_Authentication.hpp`, etc.
`const_cast` / `#define`	`MySQL_Monitor.hpp`

…tgreSQL - Add populate_monitor_pgsql_server_aws_aurora_log() to display aurora log entries - Add populate_monitor_pgsql_server_aws_aurora_check_status() to display check status - Add pgsql_aws_aurora_hostgroups support in dump_table_pgsql() - Add runtime_pgsql_aws_aurora_hostgroups dump in save_pgsql_servers_runtime_to_database() - Remove verbose error logging to match MySQL Aurora behavior pattern

leekihyunn · 2025-12-24T06:50:40Z

Added missing monitoring table population and runtime table dump functionality:

Changes:

populate_monitor_pgsql_server_aws_aurora_log() - Populates aurora monitoring log table
populate_monitor_pgsql_server_aws_aurora_check_status() - Populates check status table
dump_table_pgsql() - Added pgsql_aws_aurora_hostgroups support
save_pgsql_servers_runtime_to_database() - Added runtime aurora hostgroups dump
Removed verbose error logging to match MySQL Aurora implementation pattern

These additions complete the admin table query functionality for Aurora PostgreSQL monitoring.

renecannao · 2025-12-24T07:32:43Z

add to whitelist

renecannao · 2025-12-24T07:37:02Z

Hi @leekihyunn .
Thank you for the PR !
Assigning it to @rahim-kanji .

The preview message "add to whitelist" is to trigger CI runs .

Please note that for Aurora MySQL we have a complex simulator (you may find it searching for TEST_AURORA) , and the implementation of the Aurora PostgreSQL auto-discovery may be dependent from the simulator itself.

leekihyunn · 2025-12-29T08:59:56Z

Hi @renecannao and @rahim-kanji,

Thank you for reviewing this PR!

If there's any additional work required, please let me know and I'll address it promptly.

- Add PgSQL_Monitor_Connection_Pool class (equivalent to MySQL_Monitor_Connection_Pool) - Add MonPgSrvC class for per-server connection management - Implement get_connection/put_connection for connection reuse - Add purge_some_connections to limit connections per server (max 4) - Add purge_all_connections for cleanup on shutdown - Wrap verbose logging with #ifdef TEST_AURORA guards - Add print_pgsql_aws_aurora_status_entry for TEST_AURORA debugging - Add action counting (action_yes/no, enabling/disabling) in evaluate function

…ping Auto-discovered Aurora PostgreSQL servers were not being added to the ping monitoring target list, causing deleted/failed instances to remain in ONLINE status instead of being properly shunned. Root cause: - update_aws_aurora_set_writer() and update_aws_aurora_set_reader() were calling generate_pgsql_servers_table() to sync MyHostGroups to the internal pgsql_servers table - But update_table_pgsql_servers_for_monitor() was not called after, so the monitor's server list (pgsql_servers_to_monitor) was never updated with the new auto-discovered servers - MySQL implementation correctly calls update_table_mysql_servers_for_monitor() in these same locations Fix: Add update_table_pgsql_servers_for_monitor(false) calls in 4 locations: - update_aws_aurora_set_writer(): commit path (line 4979) - update_aws_aurora_set_writer(): auto-discovery path (line 5021) - update_aws_aurora_set_reader(): commit path (line 5107) - update_aws_aurora_set_reader(): auto-discovery path (line 5137) This ensures auto-discovered servers are properly ping monitored and will be shunned when they become unreachable.

MySQL does not log connection failures as errors (only SSL errors). Change PostgreSQL monitor to follow the same pattern by using proxy_debug instead of proxy_error for connection failures. This prevents excessive error logs (every 2 seconds) when a server is unreachable (e.g., deleted Aurora instance). Note: MySQL has DNS caching (DNS_Cache class, dns_lookup(), DNS resolver thread running every 60 seconds), while PostgreSQL does direct DNS lookup on every connection attempt. This causes PostgreSQL to log errors every 2 seconds (ping interval) vs MySQL's ~60 seconds.

This commit ensures Aurora PostgreSQL monitoring behaves identically to Aurora MySQL monitoring by adding missing functionality: 1. Add runtime variable refresh in Aurora monitoring threads - HG thread (PgSQL_monitor_AWS_Aurora_thread_HG): Now checks global variable version on each loop iteration and calls refresh_variables() when changes are detected, allowing runtime configuration changes to take effect immediately without restarting the thread - Main Aurora thread (PgSQL_monitor_aws_aurora): Added PgSQL_Thread object initialization and variable refresh loop 2. Add monitor_enabled check to Aurora thread loops - Both HG thread and main Aurora thread now check pgsql_thread___monitor_enabled in their while conditions - Setting pgsql-monitor_enabled=false now properly stops Aurora monitoring threads (previously they would continue running) 3. Add checksum comparison to update_aws_aurora_set_writer() - Before committing writer changes, compare checksums of current and incoming server lists - Only commit if there are actual changes, avoiding unnecessary table regeneration and log noise - Add verbose skip logging when hostgroup_manager_verbose > 1 These changes match the behavior of the MySQL Aurora implementation in: - MySQL_Monitor.cpp: monitor_AWS_Aurora_thread_HG() - MySQL_Monitor.cpp: MySQL_Monitor::monitor_aws_aurora() - MySQL_HostGroups_Manager.cpp: update_aws_aurora_set_writer() Benefits: - Runtime variable changes (e.g., hostgroup_manager_verbose) are now reflected immediately in Aurora monitoring threads - pgsql-monitor_enabled=false properly stops Aurora monitoring - Reduced unnecessary commits and log output when no actual changes occur - Full consistency with MySQL Aurora monitoring behavior

Changes: 1. Remove noisy "Monitor connect failed" logging on every connection failure - Errors are recorded in pgsql_server_connect_log table - Shunning logic will log when max_failures is reached (matching MySQL behavior) 2. Add Aurora health check error logging (matching MySQL behavior) - Connection failure: "Error on AWS Aurora PostgreSQL check for <host>:<port>..." - Query failure: "Error on AWS Aurora PostgreSQL check for <host>:<port>... Query failed" - Previous incorrect comment stated "Not logging to match MySQL" but MySQL does log these errors

sonarqubecloud · 2026-01-07T08:48:34Z

Quality Gate failed

Failed conditions
45 Security Hotspots
E Reliability Rating on New Code (required ≥ A)
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

leekihyunn · 2026-01-07T09:13:48Z

Aurora PostgreSQL Monitoring - Align with MySQL behavior (5 commits)

These fixes were discovered while comparing logs between Aurora PostgreSQL and Aurora MySQL environments during various tests including instance addition, deletion, and other scenarios.

1. Add connection pool (`eb4e9025`)

Add PgSQL_Monitor_Connection_Pool class (equivalent to MySQL's MySQL_Monitor_Connection_Pool)
Improve performance by reusing connections in Aurora monitoring

2. Fix auto-discovered servers not being ping monitored (`497bf897`)

Issue: Auto-discovered servers were not added to ping monitoring target list
Root cause: Missing update_table_pgsql_servers_for_monitor() calls in update_aws_aurora_set_writer/reader()
Fix: Add the function call in 4 locations

3. Change monitor connect error log level (`b6d582f1`)

Change connection failure log from proxy_error to proxy_debug
Prevent excessive error logs (was logging every 2 seconds), matching MySQL behavior

4. Align Aurora monitoring thread behavior (`6141807a`)

Add runtime variable refresh logic (config changes take effect immediately)
Properly stop Aurora threads when monitor_enabled=false
Add checksum comparison to avoid unnecessary commits on writer changes

5. Align Aurora monitor logging (`1e5fdbdc`)

Remove per-failure logging (only log when shunning occurs)
Add Aurora health check error logging (matching MySQL behavior)

Result: Aurora PostgreSQL monitoring now behaves identically to MySQL implementation

mirostauder · 2026-01-07T11:19:43Z

retest this please

mirostauder · 2026-01-07T17:09:28Z

retest this please

mirostauder · 2026-01-09T09:20:58Z

retest this please

mirostauder · 2026-01-09T09:25:19Z

retest this please

gemini-code-assist bot reviewed Dec 23, 2025

View reviewed changes

leekihyunn force-pushed the feature/aurora-postgresql-autodiscovery branch from aecb7cb to d1114d6 Compare December 23, 2025 09:21

renecannao requested a review from rahim-kanji December 24, 2025 07:33

kihyun-lee-dba added 5 commits December 30, 2025 15:34

		query = (char *)malloc(strlen(q) + strlen(_server_id) + strlen(domain_name) + 1024);
		sprintf(query, q, _server_id, domain_name, aurora_port, _whid, _rhid);

		char* q2 = (char*)malloc(strlen(q1) + strlen(lut) + strlen(hse->server_id) + 32);
		sprintf(q2, q1, wHG, hse->server_id, lut);

		char query=(char )malloc(strlen(q)+strlen(o)+strlen(p)+256);
		sprintf(query,q, writer_hostgroup, reader_hostgroup, active, aurora_port, p, max_lag_ms, check_interval_ms, check_timeout_ms, writer_is_also_reader, new_reader_weight, add_lag_ms, min_lag_ms, lag_num_checks, o);

		char address = (char )malloc(strlen(_server_id) + strlen(domain_name) + 1);
		sprintf(address, "%s%s", _server_id, domain_name);

Add AWS Aurora PostgreSQL Auto-Discovery Support #5267

Are you sure you want to change the base?

Add AWS Aurora PostgreSQL Auto-Discovery Support #5267

Uh oh!

Conversation

leekihyunn commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Why This Matters

Changes Made

New Tables

Configuration Parameters

Core Implementation

Technical Details

Aurora Status Query

Key Differences from MySQL Implementation

Writer Identification

Example Configuration

Files Changed

Testing

Uh oh!

gemini-code-assist bot commented Dec 23, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

renecannao commented Dec 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

leekihyunn commented Dec 24, 2025

Uh oh!

leekihyunn commented Dec 24, 2025

Uh oh!

renecannao commented Dec 24, 2025

Uh oh!

renecannao commented Dec 24, 2025

Uh oh!

leekihyunn commented Dec 29, 2025

Uh oh!

sonarqubecloud bot commented Jan 7, 2026

Quality Gate failed

Uh oh!

leekihyunn commented Jan 7, 2026

Aurora PostgreSQL Monitoring - Align with MySQL behavior (5 commits)

1. Add connection pool (eb4e9025)

2. Fix auto-discovered servers not being ping monitored (497bf897)

3. Change monitor connect error log level (b6d582f1)

4. Align Aurora monitoring thread behavior (6141807a)

5. Align Aurora monitor logging (1e5fdbdc)

leekihyunn commented Dec 23, 2025 •

edited

Loading

1. Add connection pool (`eb4e9025`)

2. Fix auto-discovered servers not being ping monitored (`497bf897`)

3. Change monitor connect error log level (`b6d582f1`)

4. Align Aurora monitoring thread behavior (`6141807a`)

5. Align Aurora monitor logging (`1e5fdbdc`)