(To mark the end of the year we are posting every day through January 1 with lighter vSphere and VMware topics. We hope you enjoy them as much as we do. See them all via the “2019 Wrap Up” tag!)
In our post “The Lightest Sides of vSphere” we talked a lot about retrospection and introspection, and we asked people what they thought the best feature of vSphere was. Continuing the astronomical theme with longest days and longest nights, today we ask about the dark sides of vSphere. What is the most underrated vSphere feature, a feature you wish more people knew about?
“File-Based Backup & Restore on the vCenter Server appliances.” – David S.
How are you backing up your vCenter Server appliance? Are you just taking a snapshot of it or backing it up as an image somehow? vSphere 6.5 introduced File-Based Backup and Restore (FBBR) as a method to protect your vSphere environment from failures. In short, it exports configuration information as a file to a remote file share or system. If you have issues you can restore a vCenter Server appliance using the installer and that file. Pretty slick!
Helpful tip: if you’re already backing up an image of vCenter Server it’s a good idea to set this up as well. Make sure you’re not copying it into the infrastructure you’re protecting, though! It’s very easy to forget that the Windows file server you’re using is actually a VM inside your infrastructure. Copying it to a DR site (and the DR site back to the primary site) is often a good approach. There are a number of blog posts on the vSphere Blog about FBBR, too.
“The ESXi scheduler. Its efficiency and performance do not get the attention it deserves.” – Kevin H.
This would be my vote for the #1 underrated feature of vSphere. ESXi consistently demonstrates that it’s the most efficient hypervisor on the market. This has a big effect on customers because it means less hardware is needed, even in small deployments. Less hardware then means less additional infrastructure. The indirect gains of fewer network & storage switches, less power consumption, less consumed rack space, less heat in a data center adds up very quickly, especially with “soft costs” like staff time. Payroll is often the largest line item in an enterprise IT budget and enabling IT to do more with the staff they have is a massive win. Similarly, if you’re security-minded, fewer devices means less to patch & audit.
Helpful tip: The VMware VROOM! Blog has a nice post about why containers running as part of Project Pacific under ESXi run 8% faster than bare metal. No joke! Go read it. When you’re done add that blog to your feed reader, it’s the VMware performance experts and they have lots of interesting things to say, both on their blog and in VMworld sessions.
“Snapshots. Not as backups but as ‘I’m going to do something that if it blows up I’m kinda [in trouble] and it would be great to revert with the click of a button.” – Féidhlim O.
We take a lot of things for granted when it comes to vSphere. Features like vMotion, DRS, HA, and snapshots were game-changing when they first shipped, changing the realities and possibilities of our workloads and our data centers. Speaking from my own experience, snapshots are massively underrated. A vSphere Admin, at an academic level, knows and understands snapshots. But does a vSphere Admin practice using them all the time? And does that same vSphere Admin advocate the use of snapshots to the admins of the workloads that run atop vSphere? Often the answer to both of these is no. We can do better!
Helpful tip: vSphere Admins, like all sysadmins, suffer from the problem where “no news is good news.” This means that the rest of your company only gets to see you when something bad is happening. Instead, let’s make this next year the year when your organization sees you for all the wonderful work you do. Start an internal email newsletter with tips for workload admins that they might not know. Snapshots, backup strategies, DRS affinity rules, VMware Tools tricks, VM encryption, etc. Give a brown bag presentation over lunch about what vSphere is and how it’s set up in your company, include tips & tricks for reducing risk during application upgrades and OS patching. Download and install the trial of vRealize Operations Manager and use that as a starting point for capacity conversations, remembering that rightsizing goes both ways (up and down). Telling a workload admin that it looks like they could use a few more CPUs to improve performance, or could mitigate their upgrade risks with a snapshot, are very positive conversation starters!
Helpful tip #2: Snapshots are great but be sure that you delete them when you’re done with them. A big snapshot can be a problem, too! 🙂 vRealize Operations Manager has reports & alarms for that, and community efforts like vCheck are also incredibly helpful for catching possible problems before they become operational issues. VMware also has a Knowledge Base article on snapshot best practices.
“Nested ESXi for testing stuff. It works well and allows you to test and play without affecting production or buying new HW.” – Féidhlim O.
I once heard someone say that everybody has a test environment, and some people are lucky enough to have a separate production environment, too. Test environments are great ideas but they’re almost never representative of production. They use different and often ancient hardware. They don’t run the same workloads. And if we break something while testing we have to go mess with consoles and things, which is so 1980s. Why do we keep them around? Let’s take a hint from the VMware Hands-On Labs and move our testing to nested vSphere. You can install ESXi inside a VM and deploy a test vCenter Server to manage it. When it’s all set up you can snapshot or clone it, and then when you need to you can simply revert to a known good state again. Easy, and if you use PowerCLI to script the snapshots and reverts it’s good automation practice, too.
Helpful tip: There is a tremendous community around nested virtualization, led by one of VMware’s own, William Lam. His Nested Virtualization page on his blog even has OVA images preconfigured with vSAN and serves as a reference for many of us inside of VMware, too.
“Autodeploy, watching 100s of hosts spinning up in minutes from bare metal to a full setup ready for workloads is a kind of magic.” – Nigel V.
vSphere Auto Deploy is a feature that allows a server to boot via PXE/iPXE and configure itself to automatically join a cluster and run workloads. It’s a wonderful time saver for organizations that have numerous hosts, and it’s flexible to allow for things like local caching of ESXi, helping to eliminate a dependency that could be a problem during an incident.
Helpful tip: Even if you aren’t fully interested in Auto Deploy, the PXE-based network installation methods for installing ESXi can be a real time saver if you have multiple hosts to deploy. It’s all in the documentation!
“VADP. It did nothing short of revolutionize the way we think and act when it comes to workload recovery. A crash consistent backup of a workload without application knowledge is mind boggling for the fact that it works every single time. I don’t know how many times either a precautionary ‘I did a VM backup even if the $owner says we don’t need one’ saved the day.” – Dominik Z.
The VMware vStorage APIs for Data Protection, or VADP, enable backup & replication products like vSphere Replication, VMware Site Recovery Manager, and other partners to safely, securely, and efficiently backup and restore a VM. As with snapshots and clones, these backups are often called “crash-consistent.” This means that, when restored, it’ll look like the virtual machine simply lost power. You might think that sounds awful, but the truth is that losing power like that is a situation that’s very well understood by software. Filesystems know what to do when it looks like a crash occurred. Databases generally know what to do, too. More complicated backup scenarios that cue applications to quiesce, or stop their activity, so that a better backup can happen are possible, but those schemes add complexity and can be unreliable at times, too. Testing is always important, especially of restores!
Helpful tip: You can clone a running VM. The process is called making a “hot clone.” Where possible, though, it’s always recommended to power a VM down so that the filesystems and applications are in a 100% known state: off! Also remember that crash-consistency might not work for distributed applications (such as multi-tier ones), where the VMs that make up an application should all be in sync with each other. Always ask your application vendors and backup vendors for their recommendations on how to safely back up and restore your workloads.
Have another thought on an underrated feature? Please leave it in the comments!
(Come back tomorrow for a lighthearted look into all of our wonderful users! For more posts in this series visit the “2019 Wrap Up” tag.)