Skip to content

Commit

Permalink
Implement multithreading for database checks with pgbackrest_auto (#18)
Browse files Browse the repository at this point in the history
  • Loading branch information
vitabaks authored Nov 30, 2023
1 parent 4e41f61 commit f0b9307
Show file tree
Hide file tree
Showing 2 changed files with 93 additions and 49 deletions.
20 changes: 9 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,14 +78,14 @@ Support three types of restore:
Important: Run on the nodes on which you want to restore the backup
Usage: /usr/bin/pgbackrest_auto --from=STANZANAME --to=DATA_DIRECTORY [ --datname=DATABASE [...] ] [ --recovery-type=( default | immediate | time ) ] [ --recovery-target=TIMELINE [ --backup-set=SET ] [ --backup-host=HOST ] [ --pgver= ] [ --checkdb ] [ --clear ] [ --report ] ]
Usage: /usr/bin/pgbackrest_auto --from=STANZANAME --to=DATA_DIRECTORY [ --datname=DATABASE [...] ] [ --recovery-type=( default | immediate | time ) ] [ --recovery-target=TIMELINE [ --backup-set=SET ] [ --pgver= ] [ --checkdb ] [ --clear ] [ --report ] ]
--from=STANZANAME
Stanza from which you need to restore from a backup
--to=DATA_DIRECTORY
PostgreSQL Data directory Path to restore from a backup
a PostgreSQL database cluster (PGDATA) will be automatically created (initdb) if it does not exist
a PostgreSQL database cluster (PGDATA) will be automatically created if it does not exist
Example: /bkpdata/rst/app-db
--datname=DATABASE [...]
Expand All @@ -112,16 +112,12 @@ Usage: /usr/bin/pgbackrest_auto --from=STANZANAME --to=DATA_DIRECTORY [ --datnam
incr backup: 20220611-000004F_20220614-000003D
This is the name of SET: 20220611-000004F_20220614-000003D
--backup-host=HOST
pgBacRest repository ip address (Use SSH Key-Based Authentication)
localhost [default]
--pgver=VERSION
PostgreSQL cluster (instance) version [ optional ]
by default, the PostgreSQL version will be determined from the pgbackrest info
--dummy-dump
Verify that data can be read out. Check with pg_dump >> /dev/null
Verify that data can be read out. Check with pg_dump
--checksums
Check data checksums
Expand All @@ -144,15 +140,17 @@ Usage: /usr/bin/pgbackrest_auto --from=STANZANAME --to=DATA_DIRECTORY [ --datnam
--config=/path/to/pgbackrest.conf
The path to the custom pgbackrest configuration file [ optional ]
--custom-options=""
--custom-options=
Costom options for pgBackRest [ optional ]
This includes all the options that may also be configured in pgbackrest.conf
Example: "--option1=value --option2=value --option3=value"
This includes all the options that may also be configured in pgbackrest.conf
Example: --option1=value --option2=value --option3=value
See all available options: https://pgbackrest.org/configuration.html
--process-max=
Max processes to use for restore and validate (default 1).
EXAMPLES:
( example stanza "app-db" , backup-host "localhost" )
( example stanza "app-db" , backup host "localhost" (default value) )
| Restore last backup:
Expand Down
122 changes: 84 additions & 38 deletions pgbackrest_auto
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
# for "--report": sendemail
# Run as user: postgres

ver="1.5.1"
ver="1.6.0"

# variables for function "sendmail()"
smtp_server="10.128.64.5:25"
Expand Down Expand Up @@ -106,6 +106,9 @@ while getopts ":-:" optchar; do
custom-options=* )
CUSTOMOTIONS=${OPTARG#*=}
;;
process-max=* )
PROCESS_MAX=${OPTARG#*=}
;;
esac
done

Expand Down Expand Up @@ -160,7 +163,7 @@ Usage: $0 --from=STANZANAME --to=DATA_DIRECTORY [ --datname=DATABASE [...] ] [ -
by default, the PostgreSQL version will be determined from the pgbackrest info
--dummy-dump
Verify that data can be read out. Check with pg_dump >> /dev/null
Verify that data can be read out. Check with pg_dump
--checksums
Check data checksums
Expand Down Expand Up @@ -189,6 +192,8 @@ Usage: $0 --from=STANZANAME --to=DATA_DIRECTORY [ --datname=DATABASE [...] ] [ -
Example: "--option1=value --option2=value --option3=value"
See all available options: https://pgbackrest.org/configuration.html
--process-max=
Max processes to use for restore and validate (default 1).
EXAMPLES:
( example stanza \"app-db\" , backup host \"localhost\" (default value) )
Expand Down Expand Up @@ -302,15 +307,6 @@ if ! command -v "${PG_BIN_DIR}"/pg_ctl &> /dev/null; then
exit
fi

# check if pg_checksums exists (for PostgreSQL version <= 11)
if [[ "$PGVER" -le "11" && "${CHECKSUMS}" = "yes" ]] || [[ "$PGVER" -le "11" && "${CHECKDB}" = "yes" ]]; then
if ! command -v "${PG_BIN_DIR}"/pg_checksums &> /dev/null
then
warnmsg "pg_checksums command not be found. Please install the postgresql-$PGVER-pg-checksums package"
exit
fi
fi

# check if a directory exists
if [[ ! -d "${PGDATA}" ]]; then
if ! mkdir -p "${PGDATA}"; then
Expand Down Expand Up @@ -354,6 +350,10 @@ elif [[ -n $DATNAME ]]; then
restore_type_msg="Partial PostgreSQL Restore"
fi

# process-max default
if [[ -z $PROCESS_MAX ]]; then
PROCESS_MAX=1
fi

function sigterm_handler(){
info "Recieved QUIT|TERM|INT signal"
Expand Down Expand Up @@ -513,8 +513,8 @@ function pgbackrest_exec(){
if [ -f "${detail_rst_log}" ]; then info "See detailed log in the file ${detail_rst_log}"; fi
info "Restore from backup started. Type: $restore_type_msg"
# execute pgbackrest
echo "pgbackrest --config=${pgbackrest_conf} --stanza=${FROM} --pg1-path=${TO} ${pgbackrest_opt} --delta restore --process-max=4 --log-level-console=error --log-level-file=detail --recovery-option=${recovery_opt} --tablespace-map-all=${TO}_remapped_tablespaces"
if bash -c "pgbackrest --config=${pgbackrest_conf} --stanza=${FROM} --pg1-path=${TO} ${pgbackrest_opt} --delta restore --process-max=4 --log-level-console=error --log-level-file=detail --recovery-option=${recovery_opt} --tablespace-map-all=${TO}_remapped_tablespaces"
echo "pgbackrest --config=${pgbackrest_conf} --stanza=${FROM} --pg1-path=${TO} ${pgbackrest_opt} --delta restore --process-max=${PROCESS_MAX} --log-level-console=error --log-level-file=detail --recovery-option=${recovery_opt} --tablespace-map-all=${TO}_remapped_tablespaces"
if bash -c "pgbackrest --config=${pgbackrest_conf} --stanza=${FROM} --pg1-path=${TO} ${pgbackrest_opt} --delta restore --process-max=${PROCESS_MAX} --log-level-console=error --log-level-file=detail --recovery-option=${recovery_opt} --tablespace-map-all=${TO}_remapped_tablespaces"
then
info "Restore from backup done"
sed -i 's/Restore_from_backup=0/Restore_from_backup=1/g' "${status_file}"
Expand Down Expand Up @@ -563,31 +563,68 @@ function dummy_dump(){
for db in $databases; do
info "Start data validation for database $db"
if pgisready 1> /dev/null; then
info " starting pg_dump -p ${PGPORT} -h 127.0.0.1 -d $db >> /dev/null"
if ! "${PG_BIN_DIR}"/pg_dump -p "${PGPORT}" -h 127.0.0.1 -d "$db" >> /dev/null
then
sed -i 's/Data_validation=1/Data_validation=0/g' "${status_file}"
error "Data validation in the database $db - Failed"
if [[ "$PROCESS_MAX" == '1' ]]; then
# single-threaded dump is slower, but requires less disk I/O due to redirection to /dev/null
info " starting pg_dump -p ${PGPORT} -h 127.0.0.1 -d $db >> /dev/null"
if ! "${PG_BIN_DIR}"/pg_dump -p "${PGPORT}" -h 127.0.0.1 -d "$db" >> /dev/null
then
sed -i 's/Data_validation=1/Data_validation=0/g' "${status_file}"
error "Data validation in the database $db - Failed"
else
info "Data validation in the database $db - Successful"
fi
else
info "Data validation in the database $db - Successful"
# Parallel dump speeds up the process for large databases,
# but requires writing data and more disk space for the dump.
mkdir -p "${PGDATA}"/dump/
rm -rf "${PGDATA}"/dump/"$db"
info " starting pg_dump -p ${PGPORT} -h 127.0.0.1 -d $db -F d -j ${PROCESS_MAX} -f ${PGDATA}/dump/$db"
if ! "${PG_BIN_DIR}"/pg_dump -p "${PGPORT}" -h 127.0.0.1 -d "$db" -F d -j "${PROCESS_MAX}" -f "${PGDATA}"/dump/"$db"
then
sed -i 's/Data_validation=1/Data_validation=0/g' "${status_file}"
error "Data validation in the database $db - Failed"
else
info "Data validation in the database $db - Successful"
fi
fi
fi
done
}

# checksums - check data checksums
function pg_checksums(){
if pgisready 1> /dev/null; then pg_stop cycle_simple pg_stop_check; fi
info "pg_checksums: starting data checksums validation"
sed -i 's/PG_checksums_validation=0/PG_checksums_validation=1/g' "${status_file}"
pg_checksums_result=$("${PG_BIN_DIR}"/pg_checksums -c -D "${PGDATA}" | grep "Bad checksums")
if [[ $pg_checksums_result != "Bad checksums: 0" ]]
then
warnmsg "pg_checksums: data checksums validation result: $pg_checksums_result"
sed -i 's/PG_checksums_validation=1/PG_checksums_validation=0/g' "${status_file}"
error "pg_checksums: data checksums validation - Failed"
local pg_checksums_command
local pg_checksums_result

# Determine the checksums command based on PostgreSQL version
if [[ "$PGVER" -le "11" ]]; then
if command -v "${PG_BIN_DIR}"/pg_checksums &> /dev/null; then
pg_checksums_command="pg_checksums"
elif command -v "${PG_BIN_DIR}"/pg_verify_checksums &> /dev/null; then
pg_checksums_command="pg_verify_checksums"
else
warnmsg "Checksum command not found. Please install the postgresql-$PGVER-pg-checksums package."
exit 1
fi
else
info "pg_checksums: data checksums validation - Successful"
pg_checksums_command="pg_checksums"
fi

if pgisready 1> /dev/null; then pg_stop cycle_simple pg_stop_check; fi
info "pg_checksums: starting data checksums validation"
sed -i 's/PG_checksums_validation=0/PG_checksums_validation=1/g' "${status_file}"
if [ "$pg_checksums_command" == "pg_verify_checksums" ]; then
pg_checksums_result=$("${PG_BIN_DIR}"/pg_verify_checksums -D "${PGDATA}" | grep "Bad checksums")
else
pg_checksums_result=$("${PG_BIN_DIR}"/pg_checksums -c -D "${PGDATA}" | grep "Bad checksums")
fi
if [[ $pg_checksums_result != "Bad checksums: 0" ]]
then
warnmsg "pg_checksums: data checksums validation result: $pg_checksums_result"
sed -i 's/PG_checksums_validation=1/PG_checksums_validation=0/g' "${status_file}"
error "pg_checksums: data checksums validation - Failed"
else
info "pg_checksums: data checksums validation - Successful"
fi
}

Expand All @@ -598,7 +635,7 @@ function amcheck_exists(){
else
extension='amcheck'
fi
if ! psql -v "ON_ERROR_STOP" -p "${PGPORT}" -h 127.0.0.1 -U postgres -d "$db_name" -tAXc "CREATE EXTENSION if not exists $extension" &> /dev/null
if ! psql -v "ON_ERROR_STOP" -p "${PGPORT}" -h 127.0.0.1 -U postgres -d "$db" -tAXc "CREATE EXTENSION if not exists $extension" &> /dev/null
then
error "CREATE EXTENSION $extension failed"
fi
Expand All @@ -609,19 +646,28 @@ function amcheck(){
if ! pgisready 1> /dev/null; then pg_start cycle_simple pgisready; fi
sed -i 's/Amcheck_validation=0/Amcheck_validation=1/g' "${status_file}"
databases=$(bash -c "psql -p ${PGPORT} -h 127.0.0.1 -tAXc \"select datname from pg_database where not datistemplate\"")
for db_name in $databases; do
for db in $databases; do
if pgisready 1> /dev/null; then
if amcheck_exists; then
info "amcheck: verify the logical consistency of the structure of indexes and heap relations in the database $db_name"
indexes=$(psql -p "${PGPORT}" -h 127.0.0.1 -d "$db_name" -tXAc "SELECT quote_ident(n.nspname)||'.'||quote_ident(c.relname) FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' AND n.nspname NOT IN ('pg_catalog', 'pg_toast') AND c.relpersistence != 't' AND c.relkind = 'i' AND i.indisready AND i.indisvalid")
for index in $indexes; do
# info "amcheck: verify the logical consistency of the structure of index ${index}"
if ! psql -v ON_ERROR_STOP=on -p "${PGPORT}" -h 127.0.0.1 -d "$db_name" -tAXc "select bt_index_parent_check('${index}', heapallindexed => true)" 1> /dev/null
info "amcheck: verify the logical consistency of the structure of indexes and heap relations in the database $db"
if [[ "$PGVER" -lt '14' ]]; then
# If the PostgreSQL version is less than 14, use the bt_index_parent_check function for each index in single-threaded mode.
indexes=$(psql -p "${PGPORT}" -h 127.0.0.1 -d "$db" -tXAc "SELECT quote_ident(n.nspname)||'.'||quote_ident(c.relname) FROM pg_index i JOIN pg_opclass op ON i.indclass[0] = op.oid JOIN pg_am am ON op.opcmethod = am.oid JOIN pg_class c ON i.indexrelid = c.oid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE am.amname = 'btree' AND n.nspname NOT IN ('pg_catalog', 'pg_toast') AND c.relpersistence != 't' AND c.relkind = 'i' AND i.indisready AND i.indisvalid")
for index in $indexes; do
if ! psql -v ON_ERROR_STOP=on -p "${PGPORT}" -h 127.0.0.1 -d "$db" -tAXc "select bt_index_parent_check('${index}', heapallindexed => true)" 1> /dev/null
then
sed -i 's/Amcheck_validation=1/Amcheck_validation=0/g' "${status_file}"
error "amcheck: logical validation for index ${index} ( database $db ) - Failed"
fi
done
else
# Use pg_amcheck (added in PostgreSQL 14)
if ! "${PG_BIN_DIR}"/pg_amcheck -p "${PGPORT}" -h 127.0.0.1 -d "$db" --parent-check --heapallindexed --progress --jobs="${PROCESS_MAX}"
then
warnmsg "amcheck: logical validation for index ${index} ( database $db_name ) - Failed"
sed -i 's/Amcheck_validation=1/Amcheck_validation=0/g' "${status_file}"
error "amcheck: logical validation for database $db - Failed"
fi
done
fi
fi
fi
done
Expand Down

0 comments on commit f0b9307

Please sign in to comment.