In Part 1 of this blog series I talked about vPostgres, some of its features, and why it’s the database platform of choice for the vCenter Server Appliance. In this post, Part 2, I’ll dig a bit deeper into the gears of vPostgres. We’ll take a look at some of the key configuration settings and why they are important to the vCenter Server Appliance.
Before digging into the vPostgres configuration I do want to make a quick point. This blog post is educational in nature and aimed at allowing vSphere Administrators to be more comfortable with the vPostgres database and the vCenter Server Appliance. In normal circumstances the vPostgres configuration should not be modified in any way. If the vPostgres configuration is modified it could lead to undesirable results and a lack of support from VMware GSS. But I do think it is important to understand how VMware is tuning the vPostgres configuration from a vanilla PostgreSQL deployment in order to increase the comfort level with vPostgres.
vPostgres Configuration Files
Now that the housekeeping is out of the way, let’s get to it! The vPostgres configuration files are located in /storage/db/vpostgres/ as seen in Figure 1 above. The main config file is postgresql.conf and holds all the normal configuration you’d expect for a database – log locations & rotation settings, memory tuning, and the autovacuum settings just to name a few. There are several ways you can take a look at this configuration file. One option would be to use a text editor such as vi or Vim (both are natively available on the vCenter Server Appliance). If you are looking for a specific setting within the file such as the autovacuum configuration you could use the command:
less postgresql.conf | grep autovacuum
You could also use a utility like WinSCP to download the file to your workstation and then use your text editor of choice. I would recommend Notepad++ and avoid the built-in Windows editors (Notepad & Wordpad) as they can sometimes add extraneous hidden characters or just don’t deal with the formatting of a *nix-formatted text file very well. There are definitely other ways and depending on your familiarity with Linux you can probably do this dozens of other ways. Feel free to leave a comment on this post if you have a favorite method that I haven’t covered.
vPostgres Logging
As you might expect, once you get the postgresql.conf file opened up you can see that there are quite a few configuration settings. To get detailed information on any setting in particular you can refer to the PostgreSQL 9.3 documentation. The first settings I want to call attention to are several settings that are related to logging. We set the location, naming convention, rotation, and several other logging configurations. You’ll note that we set the log directory with:
log_directory = ‘pg_log’
The pg_log directory is located in the same directory as the conf files – /storage/db/vpostgres/. But pg_log is actually a symlink to /var/log/vmware/vpostgres which is then actually a symlink to /storage/log/vmware. If we take a look in the pg_log directory we can see the log files and verify that the rotation is working correctly. We can also verify the naming convention set in the conf file (log_filename = ‘postgresql-%d.log’ by default) matches. By default we’re rotating the logs to a new file every day (24 hours) and the name of that file is postgresql-%d where %d is the day of the month. So, for example, if today is May 4th then today’s log file will be postgresql-4.log.
There is also a setting called log_truncate_on_rotation which, if enabled, tells PostgreSQL to overwrite log files with the same name. Since we enable this setting on the vCenter Server Appliance, we’ll see that the June 4th log will have the same name as the May 4th log (postgresql-4.log) and it will be truncated (cleared). This results in a fresh log file for each day of the month. One caveat is that not all months have 31 days, right? So the postgresql-31.log file may stick around a while so you’ll need to be aware of that and the timestamp of the file if you’re using the logs to troubleshoot.
One additional logging parameter is logging_collector which is off by default in PostgreSQL. We enable this parameter on the vCenter Server Appliance to catch log messages sent to stderr by the top-level vPostgres process (“postgres“). For example, for the vCSA, this means that log messages generate by vPostgres backend and extension processes, commands invoked by vPostgres, and any shared libraries that vPostgres might use will all get captured in the log files through stderr. Since these types of messages would normally be missed by syslog having logging collector turned off could result in additional time to troubleshoot an issue. One note is that the logging collector is meant to always capture the logs. Therefore, it is possible that if the vCenter Server Appliance experiences high load that the logging collector could block other processes since it will take priority. However, this should not occur during normal operations and it is highly advisable to keep this setting enabled.
vPostgres Checkpoints
Moving on from the log settings we have checkpoints. Checkpoints, as defined in the PostgreSQL documentation are “points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint”. So, in other words, we write a checkpoint to the Write Ahead Log (WAL) every so often to show that the preceding transactions have been flushed to disk. In the event of a crash, the system looks at the most recent checkpoint to determine from where it needs to start replaying transactions.
In PostgreSQL, checkpoints can be triggered via three different methods. The first method is by issuing a command via the pgsql command line interface. The second way a checkpoint is triggered is time-based which is every 5 minutes by default. The last way a checkpoint can be triggered is by volume. By default, PostgreSQL performs a checkpoint after 48 MBs of data. For vPostgres & vCenter Server, we felt that this volume would create far too many checkpoints and create unnecessary I/O (load) on the vCenter Server Appliance. Therefore, we’ve stretched that out to 90% of the disk allocated for the pg_xlog partition. This is the VMDK where the WAL resides and by default is 5 GB in size. This reduces the I/O of the checkpointing operations. Note that there is tradeoff – a modest increase in recovery.
I’ve put together the following diagram to help illustrate the checkpoint concept for vPostgres. The below diagram represents the Write Ahead Log and shows how much of the log would be replayed in the event of some sort of issue. You can see that only the portion of the file since the last checkpoint needs to be replayed.
As mentioned earlier, each time a checkpoint is created in the WAL all of the preceding transactions are flushed to disk. As you might imagine, when this data is written there is the potential for quite a bit of I/O to occur on the disks where the PostgreSQL database is stored. The PostgreSQL engineers thought of this and have a parameter to help mitigate I/O storms called checkpoint_completion_target. By default, this is set to 0.5 which means that the load created by flushing the transactions to disk during the checkpoint process is spread out across 2.5 minutes (or half a checkpoint cycle). In order to further protect vCenter Server customers from this potential I/O spike, we’ve changed the checkpoint_completion_target parameter to 0.9 to further spread out the I/O.
vPostgres Health
In the final part of this blog post I want to cover something that is added by VMware to help show the current health and status of vPostgres. This is a simple process called the health_status_worker and it writes a file to the /etc/vmware-sca/health/ directory called vmware-postgres-health-status.xml. Currently, this XML file is consumed via the vSphere Web Client and is used to show the status of the vmware-vpostgres service in the Nodes view under Administration > System Configuration in the vSphere Web Client.
While simple, this could lead to some additional capabilities down the road to make it easier to monitor the health of the vPostgres service.
That concludes this second part of the blog series on getting comfortable with vPostgres and the vCenter Server Appliance. We focused on the configuration and reviewed some of the important changes vPostgres has over a vanilla PostgreSQL installation. In the next post we’ll take a look at some of the tools that are available to monitor and manage the vCenter Server Appliance and vPostgres.
Acknowledgement
I just want to thank Michael Paquier & Nikhil Deshpande for continuing to help out on this subject matter. Thank you!