vSphere Distributed Resource Scheduler (DRS) provides a simple and easy way to manage your cluster resources. DRS works well, out of the box for most vSphere installations.
For cases where more flexibility is desired in how the cluster is managed, DRS provides many options in the form of cluster rules, settings and advanced options.
Often the impact of using rules in a DRS cluster is not very well understood. The settings and advanced options are not very well documented. Imagine if it was possible to play around with rules in your cluster before actually applying them, or changing the DRS migration threshold in your cluster without changing the setting in your live cluster – and yet, be able to visualize the impact of those actions in your cluster?
Introducing – DRS Dump Insight – to help with simple queries regarding DRS behavior, like the following.
- What if I dropped all the affinity rules in my cluster?
- What if I set cluster advanced option “AggressiveCPUActive”?
- What if I changed the DRS migration threshold from 3 to 5?
In an effort to provide a more insightful user experience and to help understand how vSphere DRS works, we recently released a fling: DRS Dump Insight.
DRS Dump Insight is a service portal where users can upload drmdump files and it provides a summary of the DRS run, with a breakup of all the possible moves along with the changes in ESX hosts resource consumption before and after DRS run.
Users can get answers to questions like:
- Why did DRS make a certain recommendation?
- Why is DRS not making any recommendations to balance my cluster?
- What recommendations did DRS drop due to cost/benefit analysis?
- Can I get all the recommendations made by DRS?
DRS Lens provides an alternative UI for a DRS enabled cluster. It gives a simple, yet powerful interface to monitor the cluster real time and provide useful analyses to the users. The UI is comprised of different dashboards in the form of tabs for each cluster being monitored.
At the VMworld 2016 Barcelona keynote, CTO Ray O’Farrell proudly presented the performance improvements in vCenter 6.5. He showed the following slide:
Slide from Ray O’Farrell’s keynote at VMworld 2016 Barcelona, showing 2x improvement in scale from 6.0 to 6.5 and 6x improvement in throughput from 5.5 to 6.5.
As a senior performance engineer who focuses on vCenter, and as one of the presenters of VMworld Session INF8108 (listed in the top-right corner of the slide above), I have received a number of questions regarding the “6x” and “2x scale” labels in the slide above. This blog is an attempt to explain these numbers by describing (at a high level) the performance improvements for vCenter in 6.5. I will focus specifically on the vCenter Appliance in this post.
High availability (aka HA) services are important in any platform, and VMware vCenter Server® is no exception. As the main administrative and management tool of vSphere, it is a critical element that requires HA. vCenter Server HA (aka VCHA) delivers protection against software and hardware failures with excellent performance for common customer scenarios, as shown in this paper.
Much work has gone into the high availability feature of VMware vCenter Server® 6.5 to ensure that this service and its operations minimally affect the performance of your vCenter Server and vSphere hosts. We thoroughly tested VCHA with a benchmark that simulates common vCenter Server activities in both regular and worst case scenarios. The result is solid data and a comprehensive performance characterization in terms of:
- Performance of VCHA failover/recovery time objective (RTO): In case of a failure, vCenter Server HA (VCHA) provides failover/RTO such that users can continue with their work in less than 2 minutes through API clients and less than 4 minutes through UI clients. While failover/RTO depends on the vCenter Server configuration and the inventory size, in our tests it is within the target limit, which is 5 minutes.
- Performance of enabling VCHA: We observed that enabling VCHA would take around 4 – 9 minutes depending on the vCenter Server configuration and the inventory size.
- VCHA overhead: When VCHA is enabled, there is no significant impact for vCenter Server under typical load conditions. We observed a noticeable but small impact of VCHA when the vCenter Server was under extreme load; however, it is unlikely for customers to generate that much load on the vCenter Server for extended time periods.
- Performance impact of vCenter Server statistics level: With an increasing statistics level, vCenter Server produces less throughput, as expected. When VCHA is enabled for various statistics levels, we observe a noticeable but small impact of 3% to 9% on throughput.
- Performance impact of a private network: VCHA is designed to support LAN networks with up to 10 ms latency between VCHA nodes. However, this comes with a performance penalty. We study the performance impact of the private network in detail and provide further guidelines about how to configure VCHA for the best performance.
- External Platform Services Controller (PSC) vs Embedded PSC: We study VCHA performance comparing these two deployment modes and observe a minimal difference between them.
Throughout the paper, our findings show that vCenter Server HA performs well under a variety of circumstances. In addition to the performance study results, the paper describes the VCHA architecture and includes some useful performance best practices for getting the most from VCHA.
For the full paper, see VMware vCenter Server High Availability Performance and Best Practices.
VMware recently released a white paper on
performance and best practices for vCenter Server in VMware vSphere 4.1. This
paper addresses four common areas of concerns regarding vCenter Server
- Performance improvements in vSphere 4.1 compared to vSphere 4.0
- vCenter Server sizing guidelines and software requirements
- Best practices in performance monitoring, tuning and troubleshooting
- Case studies demonstrating performance improvements in vSphere 4.1.
The vCenter Server in vSphere 4.1 supports a larger
inventory in a vSphere environment when compared with that supported in vSphere
4.0, both at the vCenter Server level and at the single cluster level. A table
is provided in the white paper to show the new supported limits. In addition,
in vSphere 4.1, vCenter Server can handle a larger number of concurrent tasks
at a time when compared with vSphere 4.0.
Significant performance improvements have been made in vSphere
4.1 compared to vSphere 4.0. The following list highlights some of the most
important performance improvements:
- Improved performance at higher vCenter Server inventory limits – up to 7 times higher operational throughput and up to 75% reduced operational
- Improved performance at higher cluster inventory limits – up to 3
times higher operational throughput and up to 60% reduced operational latency
- Faster vCenter Server startup – around 5 minutes for maximum
vCenter Server inventory size
- Better vSphere Client responsiveness, quicker user interaction,
and faster user login
- Faster host operations and VM operations on standalone hosts – up
to 60% reduction in latency
- Lower resource usage by vCenter agents by up to 40%
- Reduced VM group power-on latency by up to 25%
- Faster VM recovery with HA – up to 60% reduction in total
recovery time for 1.6 times more VMs
- Better load balancing with improved DRS/DPM algorithm
For more information, please read the full paper: VMware vCenter
Server Performance and Best Practices for vSphere 4.1.