Welcome to the third and final part of this introductory series. Rest assured there will be plenty more content to come on the vCenter Server Appliance and vPostgres in the coming months. To review, Part 1 of this blog series talked about vPostgres, some of its features, and why it’s the database platform of choice for the vCenter Server Appliance. Part 2 talked about the vPostgres configuration including logging, the WAL, and health status. In Part 3, I want to wrap up this series by talking about some of the nice utilities we have built into the appliance for monitoring and troubleshooting.
A frequently asked question about the appliance and vPostgres database is, “How do I monitor it”? Luckily, we have two important tools we can run directly from the vCenter Server Appliance. These two tools are very similar to a well-known tool called esxtop (you can find some great information in KB 1008205). The two tools we have on the appliance are similar to esxtop in that they provide interesting data regarding utilization. The first one is vimtop which is just like esxtop but for the appliance. William has a fantastic post on vimtop here. To highlight a few things, you can easily see processes, disks, and networks. vimtop also shows overall resource utilization of the appliance and it is easy to see if there’s a process hogging all of the CPU cycles. There is also a method to “capture” the statistics that vimtop provides so you can review the data over a period of time. This would be useful if you need to be able to review data overnight or over a specific period of time without sitting in front of vimtop. You’ve got better things to do, right?
Along those same lines, pgtop provides statistics about the vPostgres database. pgtop doesn’t have the vast number of options that vimtop has but it does provide some really cool views and ways to determine if there’s an issue with the vCenter Database.
As you can see from the screenshot above there are a limited set of options for pgtop. Running pgtop with no parameters is probably how most will use it although I often find the -I argument useful as idle processes will not be shown. This gives us a more focused view of what’s taking up CPU time in the database. Idle processes can also be hidden with pgtop by hitting the ‘i’ key. It is also important to note that pgtop is run from the Appliance Shell (appliancesh). If you happen to be using bash shell you can switch back by just typing appliancesh in bash. You’ll know you’re in the Appliance Shell if you see the Command> prompt. If your prompt looks like mgmt01vc01:~ # then you’re in bash.
Let’s take a look at the pgtop interface. The below screenshot is broken down into the following four areas:
- Current time
- Overall CPU & memory stats
- Database stats
- Detailed process information
In the first highlighted area we have the current system time of the appliance. I like that the current time is displayed in the upper right-hand corner when using pgtop. It makes it convenient to make sure that the time on the appliance is the expected time as well as having that timestamp when you take screenshots. It’s a minor detail that can be quite valuable.
In section two we can see the usual suspects for CPU and memory statistics. The iowait stat is particularly interesting as any time we’re seeing that > 0% is bad. We don’t want IO waiting on a DB server so make sure this is always 0.0%. If it is > 0% then we can look at some of the processes and see what is taking up the CPU can causing the queue. For memory, again, it’s the usual counters. Like some other database platforms, we typically don’t see a ton of free memory as the DB is designed to use what’s available. Based on my experience we should see around 5-10% of memory free and if we see < 5% free we should probably start monitoring things a bit closer.
In section 3 we have some database-specific counters to peruse. Some of the important ones, in my opinion, are:
- DB activity
- tps (transactions per second)
- hit% (buffer hit percentage)
- DB I/O
- reads/s (DB reads per second), writes/s (DB writes per second)
- DB disk
- total / free / used
Your mileage may vary for all of these stats but we want to see a hit% as close to 100% as possible. We can also monitor the DB I/O metrics and even run pgtop in batch mode to periodically collect these stats so we can analyze the counters over time. It would be great if we could get some integration with vR Ops and have it track these stats for us. Perhaps in a future release we’ll gain that ability.
Using vimtop and pgtop we are able to gain a significant amount of visibility into both the vCenter Server Appliance the vPostgres database. Generally, these tools should help identify performance issues and help point administrators to the issue and its cause. As a reminder from Part 2, there is also a wealth of logs for vPostgres located in the /storage/db/vpostgres/pg_log directory. However, GSS is always standing by to take a look at the support bundle to help root cause any issues. I hope this has been a worthwhile 3-part series and has helped you get more comfortable with vPostgres and the vCenter Server Appliance!