On top of bringing the latest PostgreSQL version at hand inside a virtual machine, vFabric Postgres 9.2.4 includes some new scripts to facilitate the integration of replication features of community version of PostgreSQL.
Here is a list of the new functionalities:
- Separate virtual disk for archives with a default size of 2G mounted on /var/vmware/vpostgres/9.2-archive.
- New default parameters in postgresql.conf to support replication by default (wal_level, max_wal_senders, archive_command, archive_mode, etc.).
- New scripts for the management of replication between vFabric Postgres nodes: slave creation, node promotion, replication monitoring. Those scripts are located in folder /opt/vmware/vpostgres/current/share.
The replication configuration that can be achieved thanks to those scripts is really simple and convenient for most of real-world applications.
- All the slaves use streaming replication to catch up with master in asynchronous mode
- If a slave is disconnected for a too long time from the cluster and it cannot get necessary WAL files from archives of master, a new base backup is needed. This helps in maintaining the archives at a low size level (size by the way customizable by changing the disk size of archives in the virtual machine settings) by keeping in memory only the WAL files that are needed by slaves to catch up with a master
- Recovery default is the latest timeline. This is to ensure that a slave trying to reconnect to a node freshly promoted will catch up with the latest changes that occurred in cluster
- Slave/slave connections are possible with cascading replication
Here are more details about the scripts implemented. They are aimed to be used only with vFabric Postgres nodes whose version is higher than 9.2.4.
Transform the existing vFabric Postgres server into a slave by getting it in sync with a given root node (either slave or master), or reconnect to an existing node in the cluster. When creating a new slave on a freshly-installed vFabric Postgres 9.2.4, you need to use options -b to take a new base backup and -W to specify password to connect to remote node, recommended user being “postgres”.
A single command based on this script is enough to set up a read-only slave becoming the replica of an existing root node in order to leverage read activity of an application with load balancing through multiple virtual machines of vFabric Postgres.
$ /opt/vmware/vpostgres/current/share/run_as_replica -h $IP_MASTER -b -W -U postgres
This script registers automatically SSH authorization key of slave node on root node for archive transfer between nodes so there is no need to worry in doing additional settings for pg_hba.conf, recovery.conf, postgresql.conf or SSH on root node side. Note that as default value max_wal_senders is set to 3, so you might want to increase this value on root node when connecting setting up more slave on a given root node.
This script can also be used to reconnect an existing slave to a new root node. In the case where slave node is not able to catch up with the root node because of missing WAL archive files or because the slave node was in advance compared to the root node in term of WAL replay, a new base backup is necessary.
A generated recovery.conf looks like that:
primary_conninfo = 'host=''$IP_ADDRESS'' user=''postgres'' application_name=''localhost.localdom'' password=''$PASSWORD'''
recovery_target_timeline = 'latest'
restore_command = 'scp $IP_ADDRESS:/var/vmware/vpostgres/9.2-archive/%f %p'
$IP_ADDRESS is the IP used by slave node to connect to a root node (either slave or master). $PASSWORD is the password that has been specified when calling run_as_replica.
This script can be used to monitor the WAL replay activity of slave nodes connected to the node where script is launched. The following query is used on the server.
with xl as (
select pg_last_xlog_receive_location() as log_receive_position,
pg_last_xlog_replay_location() as log_replay_position)
select xl.log_receive_position as log_receive_position,
xl.log_replay_position as log_replay_position,
xl.log_replay_position) as replay_delta
Compared to the default columns of pg_stat_replication (catalog table used to report activity of replicated nodes) a new column called replay_delta is used to monitor how a given slave is late compared to a master node when replaying WAL.
When monitoring replication activity, results similar to that are obtained:
sync_priority | slave | sync_state | log_receive_position | log_replay_position | receive_delta | replay_delta
0 | localhost.localdom | async | 0/8000000 | 0/8000000 | 0 | 0
The lower the values of receive_delta and replay_delta are, the closer slave node is getting to the master in term of WAL replay (replication state).
This script allows promoting a given slave to master using the default settings in recovery.conf.
Other slave nodes can reconnect to a new master using run_as_replica.
This script is used as a command for archiving WAL files (set by archive_command in postgresql.conf, kicked by vFabric Postgres server each time a WAL file is ready to be archived, or archive_timeout is reached). This script includes some checks on the existence of WAL file to be archived and some checks on the size of archive disk to ensure that only required WAL files are maintained in archives based on the size of disk available for archives. WAL files are archived if root node is part of a cluster as primary or replica.
This script can be used to create a new replication user on a master node. It generates a CREATE USER query using the password asked at prompt.
Having those scripts already in place inside the vFabric Postgres appliance and RPMs has several advantages, two of them being:
- Having unique and consistent scripts used for the management of vFabric Postgres cluster without deploying any additional and home-made tools
- Minimizing node replication/promotion operations, only a basic understanding of internal mechanics of database server is enough: failover, reconnection and slave/master structure
As a DBA and an operator, such things make the management of clusters and applications really easier in an ESX cluster of hundreds of vFabric Postgres servers for example.