VMware

VMware is Storage Protocol Agnostic | Main | Interesting items in Update 2 for VMware Infrastructure 3.5

July 31, 2008

Top Tips for Deploying VI

The following “top tips” highlight some issues that can arise in a VI deployment. They cover things which are sometimes hard to diagnose, or which might result in a problem weeks or months after some seemingly innocuous action. It is meant to shed some insight on “latent” issues, that is, those which don’t result in immediate warnings or errors when the root cause event occurs. These have been collected from customer experience gathered over time by the VI team, and will be posted in two parts.  We welcome your comments on these and any other “gotchas” that you might have encountered.

  1. Make sure DNS is fully configured. This includes ensuring proper, consistent configuration for all of the following: short name, fully-qualified name, forward lookup, and reverse lookup. Otherwise, you'll see ESX hosts intermittently disconnect from VirtualCenter, and HA might not work properly.

  2. Don’t use Virtual SMP for applications which don’t need it. Most applications are single-threaded and therefore cannot benefit from more than one virtual CPU. Assigning just a single CPU to VMs maximizes the physical CPU utilization of ALL of your cores, and avoids underutilized cores. If your applications were converted from running on 2 physical servers, don’t assume they need to – they might have been running on the smallest practical server configuration available. Start with a single VCPU, and then monitor the performance to see whether increase the number of virtual CPUs actually makes a difference.

  3. Make sure you monitor the "% ready" metric. There's one new, key metric in managing virtualization environments that is doesn't exist in physical environments: ready time. Ready time measures, for a given VM, the amount of time that a VM is ready to run on the physical CPU but processor cycles are unavailable.  In a properly loaded system, ready time should remain near zero, although percentages less than five present no significant problem.  As ready time climbs to double digit percentages, the applications are lacking a significant portion of the CPU cycles they are requesting.  This usually happens as a result of an overly aggressive consolidation, and can be solved in various ways (reducing the number of VM's running, reducing the use of virtual SMP, adding memory or other resources in case swapping is occurring, etc.). For more information see this performance study: Ready Time Observations.

  4. Watch your snapshot space growth.  Because snapshots live on your disk and grow over time, you want to be careful that you have enough spare capacity on your disk. Every snapshot consists of a “REDO” file; for the most recent snapshot, all new disk writes associated with the VM are recorded to this file. A REDO file has the potential in the extreme to grow to be the size of the original disk, and the REDO file of every snapshot that you maintain continues to occupy disk space. You want to make sure that you have enough "headroom" on your datastore to handle such growth over time.  Operations that might dramatically increase the size of your snapshots include the following: an OS service pack update, application reinstall, or a disk defrag inside the VM.

  5. Make sure the SQL Server Agent is up and running on the VirtualCenter DB. VirtualCenter depends on Microsoft SQL Server Agent to perform stats rollups. However, VirtualCenter does not have the ability to ensure this service is running on the DB server. If the user has it disabled, or the service is shut down at some point, the VI Client will not show expected stats (weekly, monthly…).  In addition, since daily data is not rolled up, it accumulates in the database, thus degrading performance and consuming more and more space.

  6. Team your management NICs if using VMware High Availability (VMware HA). This will help you avoid false alarms (i.e. false VMware HA failovers of VM's) in situations when you temporarily lose connectivity between your ESX hosts (e.g. when there's a momentary network outage, or even during a network switch maintenance operation).

Part 2 coming next week. [Update: see Top Tips for Deploying VI, Part 2]

--The VI Team

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d8341c328153ef00e553e33c958834

Listed below are links to weblogs that reference Top Tips for Deploying VI:

Comments

I think we could make a list of "anti-patterns" out of these kind of tips. So the above list (and the others we can write) might be written instead:

Anti-pattern: DNS incorrectly configured
Description: Inconsistent configuration for all of the following: short name, fully-qualified name, forward lookup, and reverse lookup
Impact: ESX Server intermittent connection with VC, HA faults
Remedy: ensure all ESX Servers point to common and well managed DNS servers that have all short name, FQDN, fwd/revers lookups.

Waiting for part 2,

Post a comment

If you have a TypeKey or TypePad account, please Sign In


Categories