We have published an ESX IP Storage Troubleshooting Best Practice white paper in which we recommend vSphere customers deploying ESX IP storage over 10G networks to include 10G packet capture systems as a best practice to ensure network visibility.
The white paper explores the challenges and alternatives for packet capture in a vSphere environment with IP storage (NFS, iSCSI) datastores over a 10G network, and explains why traditional techniques for capturing packet traces on 1G networks will suffer from severe limitations (capture drops and inaccurate timestamps) when used for 10G networks. Although commercial 10G packet capture systems are commonly available, they may be beyond the budget of some vSphere customers. We present the design of a self-assembled 10G packet capture solution that can be built using commercial components relatively inexpensively. The self-assembled solution is optimized for common troubleshooting scenarios where short duration packet captures can satisfy most analysis requirements.
Our experience troubleshooting a large number of IP storage issues has shown that the ability to capture and analyze packet traces in an ESX IP storage environment can significantly reduce the mean time to resolution for serious functional and performance issues. When reporting an IP storage problem to VMware or to a storage array vendor, an accompanying packet trace file is a great piece of evidence that can significantly reduce the time required by the responsible engineering teams to identify the problem.
The hugely popular Performance Troubleshooting for VMware vSphere 4 guide is now updated for vSphere 4.1 . This document provides step-by-step approaches for troubleshooting most common performance problems in vSphere-based virtual environments. The steps discussed in the document use performance data and charts readily available in the vSphere Client and esxtop to aid the troubleshooting flows. Each performance troubleshooting flow has two parts:
- How to identify the problem using specific performance counters.
- Possible causes of the problem and solutions to solve it.
New sections that were added to the document include troubleshooting performance problems in resource pools on standalone hosts and DRS clusters, additional troubleshooting steps for environments experiencing memory pressure (hosts with compressed and swapped memory), high CPU ready time in hosts that are not CPU saturated, environments sharing resources such as storage and network, and environments using snapshots.
The Troubleshooting guide can be found here. Readers are encouraged to provide their feedback and comments in the performance community site at this link.
VMware recently released a white paper on
performance and best practices for vCenter Server in VMware vSphere 4.1. This
paper addresses four common areas of concerns regarding vCenter Server
- Performance improvements in vSphere 4.1 compared to vSphere 4.0
- vCenter Server sizing guidelines and software requirements
- Best practices in performance monitoring, tuning and troubleshooting
- Case studies demonstrating performance improvements in vSphere 4.1.
The vCenter Server in vSphere 4.1 supports a larger
inventory in a vSphere environment when compared with that supported in vSphere
4.0, both at the vCenter Server level and at the single cluster level. A table
is provided in the white paper to show the new supported limits. In addition,
in vSphere 4.1, vCenter Server can handle a larger number of concurrent tasks
at a time when compared with vSphere 4.0.
Significant performance improvements have been made in vSphere
4.1 compared to vSphere 4.0. The following list highlights some of the most
important performance improvements:
- Improved performance at higher vCenter Server inventory limits – up to 7 times higher operational throughput and up to 75% reduced operational
- Improved performance at higher cluster inventory limits – up to 3
times higher operational throughput and up to 60% reduced operational latency
- Faster vCenter Server startup – around 5 minutes for maximum
vCenter Server inventory size
- Better vSphere Client responsiveness, quicker user interaction,
and faster user login
- Faster host operations and VM operations on standalone hosts – up
to 60% reduction in latency
- Lower resource usage by vCenter agents by up to 40%
- Reduced VM group power-on latency by up to 25%
- Faster VM recovery with HA – up to 60% reduction in total
recovery time for 1.6 times more VMs
- Better load balancing with improved DRS/DPM algorithm
For more information, please read the full paper: VMware vCenter
Server Performance and Best Practices for vSphere 4.1.