We are living in the age of the Information Explosion. Data growth is accelerating at 50% year over year. In fact, more data will be generated in the next 4 years than in the history of the world. And people are addicted to information, staying connected for up to 12 hours a day and seeing more than 34 billion bits of information every day—the equivalent of reading two books.
Earlier this year, VMware CTO Richard McDougall stated in his 2013 predictions on big data that this year will be the year that 'delete' becomes forbidden. As a result, IT is challenged to scale a volume of data that is a moving target, with the only promise that it will continue to grow exponentially.
While vFabric Postgres 9.2 is already known to have a significant performance gains, the ability to create master-slave clusters adds more power to the architecture mix. Master-slave clusters provide extra assurances for data backups and failover, but also they can be used to distribute the workload for queries and to help applications scale.
Below, we will explain how set up and test a master-slave cluster in Postgres 9.2.
Setting Up VMs for a Master-Slave Cluster in vFabric Postgres
All the virtual machines used in this post have the following environment settings. It is only necessary to add those lines in
PATH indicates the folder where the vFabric Postgres binaries are located. PGDATA is a PostgreSQL-related environment variable that indicates the folder where server data is located.
Before continuing with this post, be sure that you have installed two virtual machines with vFabric Postgres 9.2. One will be the master node, and the second one the slave node. The slave will recover information using streaming replication and will be able to execute read-only queries.
Setting Up the vFabric Postgres Master
Once the VMs are installed, log in as user Postgres on the master and add the following parameters in
wal_level = hot_standby
max_wal_senders = 3
hot_standby = on
archive_mode = on
archive_command = 'cp -i %p /var/vmware/vpostgres/current/pgdata/archive/%f'
wal_level is set to standby to allow a slave to reconstitute transaction information in order to be able to run read-only queries. hot_standby is not necessary on the master, but setting it here avoids having to do it once again in the slave after taking a base backup.
archive_mode are parameters used to make the master save WAL files to archives in a folder defined by the user.
archive_command is kicked each time a WAL file is completed (16MB of data written to file or when timeout defined by
archive_timeout is reached). This archive allows a slave to catch up with a remote node even if it is disconnected for a long time. It is necessary to create the folder
$PGDATA/archive to store the WAL files archived.
Choose an archive folder designed for your convenience and need. You can, for example, create an extra virtual disk to safely store the archives directly. Then add this setting in
host replication postgres 0.0.0.0/0 trust
This setting allows a slave node to use replication protocol and connect to the master without any IP restriction. In general, you should avoid such unsecure settings in a production environment. You should reduce the range of possible IPs to the same segment where slaves and master are going to interact depending on the network environment.
Finally, log in as root on the master VM and use the following commands to restart the server. This is needed when
archive_mode value is changed.
service aurora_mon stop
service aurora_mon start
Now the master is ready to archive and receive connections from slaves.
Setting Up the vFabric Postgres Slave
Let's continue with the settings necessary to set up the slave node. Log in to it as user root and stop the vFabric Postgres server that is currently running.
service aurora_mon stop
Then log in as user postgres. Now, what you need to do is replace the existing data folder on the slave node by a base backup from the master node.
rm -rf $PGDATA
Perform a base backup taken from the master node as follows.
postgres@localhost:~> pg_basebackup -h $MASTER_IP -D $PGDATA
WARNING: skipping special file "./postgresql.conf"
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
$MASTER_IP by the IP address of the master virtual machine.
You need also to create a soft link to point to
postgresql.conf.auto in $PGDATA correctly.
cd $PGDATA && ln -s postgresql.conf.auto postgresql.conf
Now, we need to adjust some necessary settings and create a new file
$PGDATA/recovery.conf with the following parameters:
restore_command = 'scp $MASTER_IP:/archive/%f %p'
standby_mode = on
recovery_target_timeline = 'latest'
primary_conninfo = 'host=$MASTER_IP application_name=slave'
Don't forget to replace $MASTER_IP by the IP address of the master node. With the
recovery_target_timeline set to ‘latest,’ the slave node will perform a recovery to the latest timeline available. This is a good default, as slaves will always catch up with the latest point of master. However, you might also want to look at other possible settings in the PostgreSQL documentation.
The last thing remaining is to register the SSH key of the slave node on the master node to allow the slave to fetch archive WAL files from master with scp.
ssh-keygen -t dsa
ssh-copy-id -i ~/.ssh/id_dsa.pub postgres@$MASTER_IP
What remains to do is to restart the slave node and let the magic begin! In order to do that, first login as user root on slave node and restart the server.
service aurora_mon start
Testing the vFabric Postgres Master-Slave Cluster
Now the cluster is up.
So, it is time to check that your cluster is correctly running.
$ psql -h $MASTER_IP -U postgres postgres
Password for user postgres:
Type "help" for help.
postgres=# SELECT application_name, write_location, flush_location, sync_state FROM pg_stat_replication;
application_name | write_location | flush_location | sync_state
slave | 0/40001E8 | 0/40001E8 | async
Here, the value of
application_name is the same as the one specified
recovery.conf on slave. You can then create objects on the master and query them on slave as you would in your application.
Now you have a master-slave cluster working and ready for your application data! There are a few other customizations to this setup detailed below:
- Setting the slave as a synchronous slave is simple. Set
synchronous_standby_names = 'slave'in
postgresql.confof master node and reload parameters on master with command
- Setting a cascading slave chain by connecting a slave node to another slave node. The same steps as above are necessary, the only difference being to connect to a slave instead of a master and be sure to fetch the archive files from the correct node if necessary.
To Learn More:
- Read about the key performance and scale improvements in vFabric Postgres 9.2
- Learn about the dynamic memory management of vFabric Postgres
- Download the details of the new vFabric Reference Architecture
- Learn about some of the PostgreSQL open source contributions by the vFabric team
- Take vFabric Postgres for a test drive
- Get the product overview and find more resources
See how vFabric Data Director automates DBA activities on virtualized infrastructure for Postgres, Oracle, SQL Server, Hadoop, and more.
|About the Author: Michael Paquier is a member of PostgreSQL technical staff at VMware. He is involved for many years with community development of PostgreSQL and Postgres-XC, and has worked on multi-master database technologies. He has also interest in parallel query processing and concurrent SQL processing.|