vSphere memory management has evolved over the years taking advantage of new technologies and techniques like using large memory pages, compressing memory, and using solid state devices (SSDs) for swap caching. This evolution has changed the way we need to look at memory usage and memory over-commitment in vSphere virtualized environments. To understand memory management and usage in vSphere we need to understand the metrics available to us, what they mean, and how to use them.
Today, VMware’s CTO Steve Herrod, addressed the 20,000+ attendees at VMworld and announced that VMware has out done itself again by providing a virtualization platform more than capable of supporting even the most demanding and I/O intensive mission critical applications. One impressive performance study that Steve highlighted in his keynote was the 1 Million IOPS from a Single VM study just recently completed by the performance engineering team here at VMware.
VMworld is just around the corner! Like always it is going to be an exciting action-packed week. If you are going to be at the show make sure you check out some of the VMware performance sessions being delivered.
vSphere Performance New Features Overview (you never know there might be some new things )
INF-VSP1622 – Performance New Features and Best Practices for vSphere
Monday, Aug 27, 4:00 PM – 5:00 PM – Moscone West, Level 3, Room 3002
Wednesday, Aug 29, 3:30 PM – 4:30 PM – Moscone West, Level 2, Room 2016
Some other Performance Sessions that are definitely worth checking out…
Storage Queues what are they and do I need to change them?
We have all had to wait in a line or two in our life, whether it is the dreaded TSA checkpoint line at the airport or the equally dreaded DMV registration line, waiting in line is just a fact of life. This is true in the storage world too; storage I/O’s have plenty of lines that they have to wait in. In this article, we examine the various queues in the virtualized storage stack and discuss the when, how, and why of modifying them.
Something you might have missed…. A little known feature of vSphere 5.0 called VMX swap helps to reduce the memory footprint of the VMX process by up to 80%. The vSphere kernel has several internal processes, agents, and data structures that require physical memory. vSphere reserves physical memory on the host to assure that these internal vSphere processes always have the physical memory they need. Sometimes these reservations are more than the process is actually currently using. The VMX process manages data structures needed to bootstrap and support the guest and has a memory reservation of approximately 50MB but can vary depending on virtual machine size and configuration.
To allow for better utilization of physical host memory and higher consolidation levels, vSphere 5.0 introduced the ability to create a swap location for the VMX process for a virtual machine. By having a swap location backing the memory for the VMX process, vSphere can reduce the physical memory consumed by the VMX process by allowing parts of the memory to be swapped out when the host is under memory pressure.
This feature is enabled by default in vSphere 5.0 and vSphere will automatically create the VMX swap file for each virtual machine when the virtual machine is powered on. By default, the VMX swap file is created in the virtual machine’s working directory but a different location can be set with sched.swap.vmxSwapDir. Although not recommended, the VMX swap feature can be disabled on a virtual machine by setting sched.swap.vmxSwapEnabled to FALSE.
This feature can reduce the VMX memory reservation from about 50MB or more per virtual machine to about 10MB per virtual machine with no noticeable performance impact. Although a savings of 40MB might not seem like a huge amount, with today’s hosts being able to support higher levels of consolidation and running more and more virtual machines per host, every little bit helps and it can add up to some big savings. This is especially true for environments where a very large number of virtual machines run on a single host, for instance VMware View / VDI environments where potentially hundreds of virtual machines, each with their own VMX process, are running on a single host.
For more details on the VMX Swap feature and other performance features and best practices for vSphere 5.0 that you might have missed see the vSphere 5.0 Performance Best Practices white paper http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf
While presenting the storage performance talks, I frequently get asked about Solid State Device (SSD) performance in a virtualized environment. Well obviously, SSD’s or EFD’s (Enterprise Flash Disks) are great for performance especially if you have storage intensive workloads. As seen in the previous post in this series, SSDs can provide significantly more IOPs and significantly lower latencies. But the two big questions are ”how much of a gain might I expect” and “how much SSD storage do I need to achieve that gain” when using SSDs in a virtualized environment.
Poor storage performance is generally the result of high I/O latency, but what can cause high storage performance and how to address it? There are a lot of things that can cause poor storage performance…
- Under sized storage arrays/devices unable to provide the needed performance
– I/O Stack Queue congestion
– I/O Bandwidth saturation, Link/Pipe Saturation
- Host CPU Saturation
- Guest Level Driver and Queuing Interactions
- Incorrectly Tuned Applications
- Under sized storage arrays (Did I say that twice!)
I frequently present at the various VMware User Group (VMUG) meetings, VMworld and partner conferences. If you have ever attended one of my talks, you will know it is like trying to drink from a fire hose, it is hard to cover everything in just a 45 min session. Therefore I will take the time here to write a few blogs that go over the concepts discussed in these talks in more detail (or at least slower). One of the most popular yet very fast paced talks I present is the Troubleshooting Storage Performance in vSphere. I’ll slow things down a bit and discuss each topic here, this might be just a review for some of you but hopefully as we get into more details there will be some new nuggets of VMware specific information that can help even the more advanced storage folks.
Today’s post is just the basics. What is bad storage performance and where do I measure it?
This week I’m out at EMC World 2012, and wanted to let you know of an exciting new announcement that EMC has made. EMC announced this week that they will provide Advanced Storage Analytics for their VNX storage arrays using VMware’s vCenter Operations product. The tighter integration between storage hardware and the vCenter Operations monitoring suite will mean EMC storage customers will get the best in breed analytics and monitoring software delivered by VMware Operations in order to provide them with the in depth storage statistics and operations monitoring necessary for storage administrators to optimize storage performance and validate that storage level SLAs are met. The new product, The VNX Storage Analytics Suite and VNX Connector for VMware vCenter Operations, announced this week will be generally available later this year.
vSphere is pretty smart when it comes to managing physical memory and determining where best to place a virtual machine’s memory given how busy each NUMA node is in the physical server. If a VM is running in a busy NUMA node, the ESXi kernel will automatically migrate the virtual machine to a less busy NUMA node in the server in order to get better performance.
There are 2 main reasons why ESXi might decide to migrate a VM from one NUMA node to another. ESXi might migrate to reduce CPU contention within a NUMA node, this type of migration is called “balance migration”. The other reason for migration is called “locality migration”. Locality migrations occur when ESXi detects that most of the VM’s memory is in a remote node, in such a case it is generally better to move/schedule the virtual machine to run from the NUMA node where most of the virtual machine’s memory is, as long as it doesn’t cause CPU contention to occur in that NUMA node.
ESXi keeps a constant eye on your virtual machines and their NUMA placement and performs NUMA migrations either for balance or locality reasons in order to optimize the overall system performance. ESXi generally does a great job at this, but there can be hiccups. If you notice performance problems caused by either unbalanced NUMA nodes causing high ready time or poor virtual machine NUMA locality causing high memory latency there are a few things you can try.
First and most importantly, try and size your VMs as a multiple of your physical server’s NUMA node size. For instance, if your physical server has 6 cores per NUMA node, size your VMs as either 2, 3, or 6 way. This KB article explains the potential impact of mismatched VM sizes to NUMA node sizes in more detail http://kb.vmware.com/kb/1026063
You might also try and tweak the advance NUMA settings for the VM. For instance, you can manually set the NUMA node affinity for the VM on the resources tab of the virtual machine edit settings screen in vCenter. Administrators should be cautious about manually setting VM – NUMA affinity as it may become difficult to manage manually balancing NUMA resources as your environment grows. Another setting that might be beneficial to look at is the numa.vcpu.maxPerMachineNode. By default ESXi tries to place a single VM in as few NUMA nodes as possible, this generally provides the best performance because it provides the best memory locality and reduces memory latency. But some applications may be memory bandwidth sensitive instead of memory latency sensitive, these memory bandwidth sensitive applications may benefit from the increased memory bandwidth that comes from using more NUMA nodes and thus more paths to memory. For these applications you may want to modify the VM advanced parameter numa.vcpu.maxPerMachineNode. Setting that parameter to a lower value will split the VM’s vCPUs up across more NUMA nodes.
New in vSphere 5.0 is the vNUMA feature that presents the physical NUMA typology to the guest operating system. vNUMA is enabled by default on VMs greater than 8 way, but if you have VMs that are not greater than 8 way but are still larger than your physical server’s NUMA node size, than you might want to enable vNUMA on those VMs. To enable vNUMA on 8 way or smaller VMs, modify the numa.vcpu.min setting. See “Advanced Virtual NUMA Attributes” in the vSphere 5.0 Resource Management guide for more details.
Lastly, If you are noticing any significant and unexplained performance problem, it might be best to call in the experts and notify VMware Support. They will be able to take a detailed look at the NUMA client stats for your VM. The NUMA client stats can show detailed VM NUMA counters like the number of Balance Migrations and Locality Migrations a VM has had, if these values are roughly the same then it might indicate NUMA migration thrashing.
Bottom Line… ESXi does its best at placing VMs optimally for best NUMA performance, but there are things you can do to help it, most importantly of which is sizing and configuring your VMs with NUMA in mind.