Skip to content
ric2b edited this page Sep 10, 2015 · 3 revisions

Troubleshooting

The logs are usually the best place to start when trying to find out what the problem is. Try to figure out what machine or architecture layer has issues and then do tail -f filename on the relevant log to see what's happening (this simply opens the file on the last lines and continuosly updates).

PostgreSQL: /var/lib/pgsql/9.4/data/pg_log/

There is a different log file for each weekday in this folder

PgPool II: /tmp/pgpool.log

Failover Script: /tmp/failover.log

You can also used these very useful commands I already mentioned to pinpoint the problem:

  • psql -U postgres -h 192.168.1.111 -c "show pool_nodes" - use on a PgPool server
  • psql -U postgres -h 192.168.1.1 -x -c "select * from pg_stat_replication" - use on the Master

Common issues

PostgreSQL layer

  • Make sure the pg_hba.conf file has the correct settings
  • Verify that the IP on the recovery.conf files of each slave points to the master
  • If you're trying to failback a server as a Slave, remember to delete /tmp/promotedb and add the recovery.conf file

PgPool layer and virtual IP

  • The pgpool.conf files are the same except for the lines about the watchdog functionality. Verify that they have eachother's IP's on the relevant settings
  • Another machine in the network may be also trying to use the same virtual IP

Automatic Failover

  • The passwordless login wasn't configured correctly, check if you can ssh with both accounts without entering a password
  • The sudo user does not have sudo access
  • The postgres user isn't the owner of the data folder

Clone this wiki locally