Many organizations today have a ROBO environment with local IT infrastructure. These remote locations usually have anywhere from a few servers running a few workloads to support local needs, to numerous servers spanning a large-scale datacenter. The distributed and remote nature of this infrastructure makes it hard to manage, difficult to protect, and costly to maintain. Further, the remote nature of servers makes it more challenging to perform important VM/host-related operations.
vSphere is designed to address these ROBO use cases, including IT infrastructure located in remote, distributed sites. VMware vCenter Server provides a centralized way to control and monitor the virtual infrastructure, including ESXi hosts, virtual machines, storage, and networking resources. It has been widely deployed in a ROBO environment to manage ESXi hosts that are distributed over large geographical distances over a wide range of networks with different network characteristics, including low/high bandwidth, network latency, and packet error rates. In the paper, we test:
LAN with high-bandwidth and low-latency links.
WAN with low-bandwidth and high-latency links.
Various networks in between; for example, DSL, T1, 4G, 5G, …
We demonstrate that vCenter Server performs well in the ROBO environment for both network bandwidth use, as well as virtual machine and ESXi host task execution times. Instead of a bandwidth restriction, we observe that network latency has a bigger impact on the overall performance. As the network latency between vCenter Server and ESXi hosts increases, the average operation latency also increases. The experimental results also show how efficiently vCenter Server executes VM operations in high-latency networks: The average VM operation execution time increases much more slowly when network latency increases by several times.
vSphere 5.1 introduced an inventory tagging feature that has been available in all later versions of vSphere, including vSphere 6.7. Tags let datacenter administrators organize different vSphere objects like datastores, virtual machines, hosts, and so on. This makes it easier to sort and search for objects that share a tag, among other things. For example, you might use tags to track a group of VMs that all have the same operating system.
Writing code to use tags can be challenging in large-scale environments: a straightforward use of VMware PowerCLI cmdlets may result in poor performance, and while direct Tagging Service APIs are faster, the documentation can be difficult to understand. In this blog, we show some practical examples of using PowerCLI and Tagging Service APIs to perform tag-related operations. We include some simple measurements to show the performance improvements when using the Tagging Service vs. cmdlets. The sample performance numbers are for illustrative purposes only. We describe the test setup in the Appendix.
Ever wondered how DRS distributes resources to VMs? How much resources your VMs are entitled to? How reservations, limits, and shares (RLS) affect your VMs’ resource availability? Our new fling, DRS Entitlement Viewer, is the answer.
DRS Entitlement Viewer is installed as a plugin to the vSphere Client. It is currently only supported for the HTML5-based vSphere Client. Once installed, it gives the hierarchical view of vCenter DRS cluster inventory with entitled CPU and memory resources for each resource pool and VM in the cluster.
Entitled resources can change with VMs’ resource demand and with the VM’s and resource pool’s RLS settings. So, users can get the current entitlements based on the VMs’ current demands and RLS settings of the VMs and resource pools.
DRS Entitlement Viewer also provides three different what-if scenarios:
Changing RLS settings of a VM and/or resource pool
What-if all the VMs’ resource demand is at 100%
Both 1 and 2 happen together
Users can pick one of the three scenarios and can get new entitlements without actually changing RLS settings on the cluster.
Finally, DRS Entitlement Viewer also provides an option to export the new RLS values from a what-if scenario as a vSphere PowerCLI command that customers can execute against their vCenter to apply the new settings.
vSphere Distributed Resource Scheduler (DRS) provides a simple and easy way to manage your cluster resources. DRS works well, out of the box for most vSphere installations.
For cases where more flexibility is desired in how the cluster is managed, DRS provides many options in the form of cluster rules, settings and advanced options.
Often the impact of using rules in a DRS cluster is not very well understood. The settings and advanced options are not very well documented. Imagine if it was possible to play around with rules in your cluster before actually applying them, or changing the DRS migration threshold in your cluster without changing the setting in your live cluster – and yet, be able to visualize the impact of those actions in your cluster?
Introducing – DRS Dump Insight – to help with simple queries regarding DRS behavior, like the following.
What if I dropped all the affinity rules in my cluster?
What if I set cluster advanced option “AggressiveCPUActive”?
What if I changed the DRS migration threshold from 3 to 5?
In an effort to provide a more insightful user experience and to help understand how vSphere DRS works, we recently released a fling: DRS Dump Insight.
DRS Dump Insight is a service portal where users can upload drmdump files and it provides a summary of the DRS run, with a breakup of all the possible moves along with the changes in ESX hosts resource consumption before and after DRS run.
Users can get answers to questions like:
Why did DRS make a certain recommendation?
Why is DRS not making any recommendations to balance my cluster?
What recommendations did DRS drop due to cost/benefit analysis?
DRS Lens provides an alternative UI for a DRS enabled cluster. It gives a simple, yet powerful interface to monitor the cluster real time and provide useful analyses to the users. The UI is comprised of different dashboards in the form of tabs for each cluster being monitored.
At the VMworld 2016 Barcelona keynote, CTO Ray O’Farrell proudly presented the performance improvements in vCenter 6.5. He showed the following slide:
Slide from Ray O’Farrell’s keynote at VMworld 2016 Barcelona, showing 2x improvement in scale from 6.0 to 6.5 and 6x improvement in throughput from 5.5 to 6.5.
As a senior performance engineer who focuses on vCenter, and as one of the presenters of VMworld Session INF8108 (listed in the top-right corner of the slide above), I have received a number of questions regarding the “6x” and “2x scale” labels in the slide above. This blog is an attempt to explain these numbers by describing (at a high level) the performance improvements for vCenter in 6.5. I will focus specifically on the vCenter Appliance in this post.
High availability (aka HA) services are important in any platform, and VMware vCenter Server® is no exception. As the main administrative and management tool of vSphere, it is a critical element that requires HA. vCenter Server HA (aka VCHA) delivers protection against software and hardware failures with excellent performance for common customer scenarios, as shown in this paper.
Much work has gone into the high availability feature of VMware vCenter Server® 6.5 to ensure that this service and its operations minimally affect the performance of your vCenter Server and vSphere hosts. We thoroughly tested VCHA with a benchmark that simulates common vCenter Server activities in both regular and worst case scenarios. The result is solid data and a comprehensive performance characterization in terms of:
Performance of VCHA failover/recovery time objective (RTO): In case of a failure, vCenter Server HA (VCHA) provides failover/RTO such that users can continue with their work in less than 2 minutes through API clients and less than 4 minutes through UI clients. While failover/RTO depends on the vCenter Server configuration and the inventory size, in our tests it is within the target limit, which is 5 minutes.
Performance of enabling VCHA: We observed that enabling VCHA would take around 4 – 9 minutes depending on the vCenter Server configuration and the inventory size.
VCHA overhead: When VCHA is enabled, there is no significant impact for vCenter Server under typical load conditions. We observed a noticeable but small impact of VCHA when the vCenter Server was under extreme load; however, it is unlikely for customers to generate that much load on the vCenter Server for extended time periods.
Performance impact of vCenter Server statistics level: With an increasing statistics level, vCenter Server produces less throughput, as expected. When VCHA is enabled for various statistics levels, we observe a noticeable but small impact of 3% to 9% on throughput.
Performance impact of a private network: VCHA is designed to support LAN networks with up to 10 ms latency between VCHA nodes. However, this comes with a performance penalty. We study the performance impact of the private network in detail and provide further guidelines about how to configure VCHA for the best performance.
External Platform Services Controller (PSC) vs Embedded PSC: We study VCHA performance comparing these two deployment modes and observe a minimal difference between them.
Throughout the paper, our findings show that vCenter Server HA performs well under a variety of circumstances. In addition to the performance study results, the paper describes the VCHA architecture and includes some useful performance best practices for getting the most from VCHA.
VMware recently released a white paper on
performance and best practices for vCenter Server in VMware vSphere 4.1. This
paper addresses four common areas of concerns regarding vCenter Server
Performance improvements in vSphere 4.1 compared to vSphere 4.0
vCenter Server sizing guidelines and software requirements
Best practices in performance monitoring, tuning and troubleshooting
Case studies demonstrating performance improvements in vSphere 4.1.
The vCenter Server in vSphere 4.1 supports a larger
inventory in a vSphere environment when compared with that supported in vSphere
4.0, both at the vCenter Server level and at the single cluster level. A table
is provided in the white paper to show the new supported limits. In addition,
in vSphere 4.1, vCenter Server can handle a larger number of concurrent tasks
at a time when compared with vSphere 4.0.
Significant performance improvements have been made in vSphere
4.1 compared to vSphere 4.0. The following list highlights some of the most
important performance improvements:
Improved performance at higher vCenter Server inventory limits – up to 7 times higher operational throughput and up to 75% reduced operational
Improved performance at higher cluster inventory limits – up to 3
times higher operational throughput and up to 60% reduced operational latency
Faster vCenter Server startup – around 5 minutes for maximum
vCenter Server inventory size
Better vSphere Client responsiveness, quicker user interaction,
and faster user login
Faster host operations and VM operations on standalone hosts – up
to 60% reduction in latency
Lower resource usage by vCenter agents by up to 40%
Reduced VM group power-on latency by up to 25%
Faster VM recovery with HA – up to 60% reduction in total
recovery time for 1.6 times more VMs
Better load balancing with improved DRS/DPM algorithm