VMware

October 12, 2008

The virtual datacenter operating system from VMware

A few weeks ago, 14,000 of VMware’s customers and partners attended VMworld 2008 in

Las Vegas

. It was an incredible opportunity for VMware to define what we do and outline the exciting new products coming in 2009 …and we did. We defined a bold, clear vision, one that we’ve been marching to for a while, but have only recently articulated very precisely. In the process, we introduced new concepts, new definitions and new names…a quick overview is at the below link

http://www.vmware.com/technology/virtual-datacenter-os/

I’m going to use this blog to explain the thought process behind these new announcements.

What we announced: The roadmap for 2009

The vdc-os

We announced that we extend our product, VMware Infrastructure, to a virtual datacenter operating system.

In 2009, we will have a whole range of new capabilities and when you add them to the products and capabilities that we have currently, you get a computing platform that does 2 main things:

  1. Abstract away from the complexity of hardware across servers, storage and network to create a unified and uniform platform which then provides resources to applications on an as-needed basis
  2. Abstract away from the complexity of applications and provides OS-agnostic services to all applications

The first set of enabling capabilities is called Infrastructure vServices, the second set of capabilities is called Application vServices. In a single server environment, these capabilities would be delivered by a traditional, single server operating system such as Windows or Linux.

In an environment with many servers, you need a computing platform that works across all its disparate piece parts. You need an operating system for your entire datacenter – one that removes complexity and fundamentally simplifies IT. We call this the virtual datacenter operating system.

This is an operating system that virtualizes servers, storage, network into a giant shared resource and then precisely allocates this resource to applications. Along the way, the very nature of this distributed operating system enables point and click enablement of availability, scalability and security services to ALL applications. Applications that run on standard x86 servers, just run on this vdc-os with no modification and avail of all these services.

The virtual datacenter operating system fundamentally changes the way datacenters need to be managed because of 3 main reasons:

-         Resources are , by definition, shared

-         Applications are no longer tied to a single piece of hardware

-         Service level requirements of the applications are parameters that are easily set by app owners and delivered automatically by the infrastructure

Management vServices deliver the comprehensive management that is aware of these properties of a vdc-os. Management vServices also allow for easy integration with existing systems management frameworks – so that virtual environments retain the benefits associated with virtualization and also work well in the larger framework of non-86 environments.

So what?

Think about this in the context of some larger trends in the industry. Next year, volume servers will ship with 8 cores in a single socket – that is 16 cores per 2-way server. Very very few applications can truly take advantage of so many cores. Virtualization is inevitable.

Now think about whether you want your individual virtualized servers to be islands unto themselves, or whether you want your IT setup to be managed at a datacenter level, with an OS that unifies many industry standard parts.

Think about whether you want to set up and customize availability and scalability for every application or whether you want to enable it through point and click as part of the provisioning process. As applications get componentized, they will consist of many tiers. The process for guaranteeing service levels for each tier will be difficult unless service level guarantees were available in a application/OS independent way.

The VMware approach not only makes you less susceptible to failures of each individual component, it also gives you additional flexibility that you could not have had with a single server approach, such as the capability to scale any type of application quickly, the ability to assure a high priority application additional resources when needed or the ability to provide high availability in an application/OS independent way.

In times such as the one that we live in, lowering costs and running datacenters at the highest efficiency is going to be hugely important as all of us try to deliver more with less.

VMware is truly delivering what the industry calls utility computing. It is bringing the most efficient model of computing to every datacenter.

Want more?

The webcast at this link gives you a quick overview of the virtual datacenter operating system from VMware.

http://info.vmware.com/content/NextGenVDC_webcast_5059

For a deep dive on some of the new features such as the future vNetwork Distributed Switch and VMware Fault Tolerance, see below for a link to a couple of live webinars

Next-Generation Virtual Networking for VMware Infrastructure – 10/16, 9am PT

http://www.vmware.com/a/webcasts/details/157?src=BLOGS_08Q4_VMW_OTHER_WEBCASTSERIES_OCT16_LJ&ossrc=BLOGS_08Q4_VMW_OTHER_WEBCASTSERIES_OCT16_LJ 

Technical Track: Fault Tolerance for VMware Virtual Machines – 11/5, 9am PT

http://www.vmware.com/a/webcasts/details/152?src=BLOGS_08Q4_VMW_OTHER_WEBCASTSERIES_NOV5_LJ&ossrc=BLOGS_08Q4_VMW_OTHER_WEBCASTSERIES_NOV5_LJ

And as always, do send in any questions or comments to me at ljoshi@vmware.com


August 08, 2008

Top Tips for Deploying VI, part 2

 1. If you have an active/passive FC storage array (most mid-range arrays fall into this bucket), be careful about setup. Firstly, be sure to have redundant paths from FC switches to your arrays’ storage processors. Secondly, be sure to use “MRU” (the default) for the path-selection policy and not “fixed”.

The best way to explain the first issue is with a picture.  What’s wrong with the following configuration?

Vitips21

Although you might believe that you have full redundancy between the hosts and the switches, and specifically that you can survive one HBA failure on each host, the reality is that you don’t have enough redundancy.  Here’s one failure scenario that won’t be handled properly:

Vitips22

The reason is that, with active/passive storage arrays, a given LUN can only be presented on one storage processor at a given time.   The LUN can shift from one storage processor to another, but such a shift takes many seconds (potentially up to 30 seconds).   If both HBA’s have failed (as in the above diagram), then the ESX hosts won’t be able to access to the same LUN at the same time.  Host 1 attempts to access the LUN on storage processor 1; host 2 attempts to access the same LUN on storage processor 2; and you end up with a ping-pong effect, or a “path thrashing” effect due to the active/passive array shifting the LUN back and forth between the two storage processors.  Performance of VM’s on both hosts will be erratic and penalized.

The solution is simple: create redundant connections from the FC switches to the array storage processors, as shown below.

Vitips23

There is a second noteworthy issue with active/passive arrays related to this same path thrashing effect: make sure that you use the “MRU” path selection policy (the default) rather than the “fixed” path selection policy.  If you use “fixed”, you may make the mistake of forcing the use of a particular storage processor for one host… but a different storage processor for another host… and thus end-up in a similar LUN ping-pong or path thrashing situation.

For more details about path thrashing see, the SAN Configuration Guide.

2. When configuring your VI environment for VMotion, make sure that your physical network switches are configured properly; in particular, make sure that each port has the right network (e.g. VLAN) visibility. 

VMotion requires that the destination ESX host have similar network connectivity to the source ESX host (so that, for example, the VM can continue access to its assigned VLAN after the VMotion).  VirtualCenter checks for correct virtual switch configuration on the source and destination ESX; however, VirtualCenter does not for correct configuration of the physical network switches.  In a larger VI deployment where many network switch ports are involved, a single misconfiguration of a single physical switch port can be hard to detect.  The symptom will be as follows: when the particular VM relying on a particular VLAN id VMotion migrates to the particular ESX host with the misconfigured switch port, the VM loses all network connectivity.   Solution: when adding new ESX hosts to a network, take the time to double-check your network switch port configurations to make absolutely sure that all the VLANs are correctly configured.

3. When using VMware HA, take note of how memory reservations are specified and used to reserve cluster failover capacity.  Using more consistent reservations or disabling admission control are both appropriate workarounds if the calculations are overly conservative in your environment.

How VMware HA works: If a VMware ESX host fails, VMware HA will restart the VMs affected by that failure on alternate hosts in the cluster.  In order to do so, HA must reserve failover capacity within the cluster.  HA currently achieves this by implementing an “admission control” policy that prevents (or warns against) the powering on of VMs that would encroach upon the failover capacity being reserved.  In some cases, however, the admission control calculations may be too conservative.

Example scenario: Suppose you have 19 VMs, each with a 300 MB memory reservation.  To power-on all of these VM's, you need 5.7GB of RAM (=19*0.3) (total within the cluster, after allocating space for potential host failures, and not accounting for memory sharing in ESX).  Since all reservations are equivalent, HA defines an average VM to require 300 MB of memory.

Now, let's suppose you power-on a 20th VM with a 2 GB memory reservation.  Instead of calculating memory requirements as 7.7 GB (=19 x 0.3 + 1 x 2), HA takes a more conservative approach and redefines the average VM to be the biggest reservation observed.   With the higher reservation specified, HA will cautiously assume that every VM need 2 GB of memory, and will ask for 40GB (=20*2) of RAM to be set aside for total runtime and failover capacity within the cluster.  These calculations are intended to be conservative to ensure that sufficient spare capacity is available, without fragmentation across hosts within a cluster.

In many cases (such as clusters with widely varying sizes of hosts and VMs), however, these calculations can be more conservative than desirable, and can lead to “insufficient failover capacity” warnings when powering on more VMs.

Two potential approaches are recommended if you are observing these warnings, or want to avoid them within a heterogeneous cluster configuration:

Approach 1: Either lower the reservations on your most demanding VM’s, or remove the reservations skewing the calculations and rely upon “shares” instead.  See the resource management guide for differences between reservations and shares.

Approach 2:  Alternatively, configure HA to disable strict admission control.  Host failures will still be detected and acted upon, but VMware HA will not prevent the starting of new VMs due to insufficient failover capacity.

4. When sizing your LUNs, a medium-sized LUN (~500GB) seems best for most situations.   

Small LUN’s (and VMFS volumes) can result in SAN management complexity (too many LUNs to manage).  Very large LUN’s can result in performance issues, too coarse a granularity for troubleshooting and performance tuning, and failure/error isolation.  The below chart summarizes some of the considerations.  Details are provided on page 72 of the VI 3 SAN Design Guide.

 Smaller LUN /
VMFS volume
100GB
Medium-sized LUN /
VMFS volume
500GB
Larger LUN /
VMFS volume
3TB
VMFS: Metadata overhead Some overhead (0.5%) Negligible overhead (<0.1%) Negligible overhead (<0.1%)
Impact of a failure or error, difficulty of troubleshooting Affects a few VM's Affects 20-30 VM's Affects many VM's
Ease of SAN mgmt Hard (many LUN's to manage) Medium Easy (just 1 LUN to manage)
Ease of tuning performance (**) High (tunable per the few VM's on a LUN) Medium (tunable for 20-30 VM's at a time) Low (one setting for many, many VM's)
Flexibility in specifying value-added services  (***) High (different LUNs can have different policies or settings) Medium (tunable for 20-30 VM's at a time) Low (many VMs share the same policies or settings)

(*) File creation in VMFS grabs a SCSI lock on the LUN.  Excessive concurrent file creation in VMFS can cause lock contention, which can hurt performance.  This can be apparent if multiple users are concurrently creating VM’s (and therefore VMFS files), or when a VCB-based backup process is concurrently backing up multiple VM’s (and is therefore concurrently creating multiple VMFS REDO files)
(**) e.g. RAID-level, array caches, queue depths, path selection/path dedication
(***) e.g. Backup, other data protection features such as replication, mirroring, etc., capacity optimization features such as de-dupe, thin-provisioning, etc., security and encryption features

See also Top Tips for Deploying VI, part 1

--The VI Team


August 01, 2008

Interesting items in Update 2 for VMware Infrastructure 3.5

VMware Infrastructure 3.5 U2 now available

 

After all the dramatic news from VMware over the last month or so, it may feel like the availability of VMware Infrastructure 3.5 Update 2 is not particularly newsworthy but there are a few things quietly being delivered that merit a good deal of attention.

 

Enhanced VMotion compatibility. (EVC)

 

Previous releases of VMware Infrastructure restricted VMotion between processors belonging to different generations even if they were from the same manufacturer. These restrictions were put in place to ensure that a consistent CPU feature set was always exposed to software.

 

KB Articles 1991 (Intel) and 1992 (AMD) describe the current compatibility groups for VMotion. KB article 1993 described methods for masking select features to relax the compatibility requirements.

 

 

The reason I think Enhanced VMotion is something VMware users should care about is because it radically simplifies the process of determining VMotion compatibility.  

 

VMware worked closely with AMD and Intel on the specification for AMD-V Extended Migration and Intel FlexMigration technologies which are used to make newer generation CPUs backward compatible with older CPU generations. 

 

With EVC, it is now much easier to add newer generation hardware into your existing VMware infrastructure while maintaining VMotion compatibility between the new and the older hardware. This makes adding new ESX hosts and retiring older hosts easier since you no longer need to worry much about CPU VMotion compatibility. Best of all, it’s really simple. No complicated compatibility matrices, and no CPU masks! Woohoo!

 

Processors included in the new enhanced VMotion compatibility for Intel are:

 

  • Quad-Core Intel® Xeon® processor 7300
  • Quad core Intel Xeon processor 5100/5200/5300/5400 series, based on the Intel® Core™ microarchitecture
  • Future Xeon processors based on Enhanced Intel® Core™ Microarchitecture.

For those familiar with Intel code names, these are Intel Core 2 (Merom) based processors and Intel Core 2 Duo (Penryn) based processors. All future processors from Intel with Intel VT FlexMigration will be VMotion compatible as well.

 

Processors included in the new enhanced VMotion compatibility for AMD are:

 

  • First-Generation AMD Opteron ™ Rev. E  
  • AMD Second-Generation AMD Opteron
  • Third-Generation AMD Opteron as well as future AMD Opteron™ processors.

* This is documented on pg 239 of the Basic Systems Administration Guide for VMware ESX 3.5 U2. We're still trying to get the processor marketing names into this doc. Please comment on this blog if you use marketing names to identify your processor...and think this is a worthwhile exercise.

 

Although EVC makes great strides in enabling VMotion across multiple CPU generations from the same vendor, it is not possible to VMotion from AMD processors to Intel processors or vice versa. Will it ever happen? Anyone want to place bets?

 

 

VSS Quiescing for Windows Applications:

 

VCB is possibly the least understood component of VMware Infrastructure. So one might wonder why I’m making a big deal with VCB 1.5.


VCB 1.5 now uses new components inside the updated VMware Tools package to provide application level quiescing(Windows 2003 VMs) in addition to filesystem level quiescing. This means that, if you are using backup products integrated with VCB to backup virtual machines, the snapshots of virtual machines will now have the assured application consistency if the applications running inside support VSS. Significant performance optimizations also make the backup process much faster with this version of VCB.

 

 

Monitoring and availability enhancements

 

With Update 2, VMware ESXi (yes, the free one) has been enriched to provide better hardware health information for Qlogic, Emulex, LSI components in your server as well as additional asset information used by HP Insight Manager. Overall, this augments the manageability of ESXi as we continue to deepen the information it can gather about the underlying hardware. Some of our larger customers are now standardizing on ESXi because they love its small footprint and low maintenance (they’re standardizing on ESXi as the hypervisor and VMware Infrastructure to still provide the VMotion, DRS, HA etc). Now the enhanced manageability had taken away all barriers for them!

 

With VMware Infrastructure 3.5, we had introduced experimentally, a feature for virtual machine failure monitoring with VMware HA. This feature uses VMware Tools to monitor the operating system inside the virtual machine and can be configured to restart the VM in the event of failures or crashes of the OS. With Update 2, we now fully support this feature!


Why am I picking out this one as a notable feature? Well, we got a lot of bad press about releasing experimental features with 3.5 and one by one, we’re moving to full support on them. Slowly but surely.

 

 

VMware Infrastructure 3.5 Update 2 has many more interesting features – such as live cloning for virtual machines, guided consolidation enhancements and the much awaited support for Windows 2008 editions…read the release notes to find out more!


July 31, 2008

Top Tips for Deploying VI

The following “top tips” highlight some issues that can arise in a VI deployment. They cover things which are sometimes hard to diagnose, or which might result in a problem weeks or months after some seemingly innocuous action. It is meant to shed some insight on “latent” issues, that is, those which don’t result in immediate warnings or errors when the root cause event occurs. These have been collected from customer experience gathered over time by the VI team, and will be posted in two parts.  We welcome your comments on these and any other “gotchas” that you might have encountered.

  1. Make sure DNS is fully configured. This includes ensuring proper, consistent configuration for all of the following: short name, fully-qualified name, forward lookup, and reverse lookup. Otherwise, you'll see ESX hosts intermittently disconnect from VirtualCenter, and HA might not work properly.

  2. Don’t use Virtual SMP for applications which don’t need it. Most applications are single-threaded and therefore cannot benefit from more than one virtual CPU. Assigning just a single CPU to VMs maximizes the physical CPU utilization of ALL of your cores, and avoids underutilized cores. If your applications were converted from running on 2 physical servers, don’t assume they need to – they might have been running on the smallest practical server configuration available. Start with a single VCPU, and then monitor the performance to see whether increase the number of virtual CPUs actually makes a difference.

  3. Make sure you monitor the "% ready" metric. There's one new, key metric in managing virtualization environments that is doesn't exist in physical environments: ready time. Ready time measures, for a given VM, the amount of time that a VM is ready to run on the physical CPU but processor cycles are unavailable.  In a properly loaded system, ready time should remain near zero, although percentages less than five present no significant problem.  As ready time climbs to double digit percentages, the applications are lacking a significant portion of the CPU cycles they are requesting.  This usually happens as a result of an overly aggressive consolidation, and can be solved in various ways (reducing the number of VM's running, reducing the use of virtual SMP, adding memory or other resources in case swapping is occurring, etc.). For more information see this performance study: Ready Time Observations.

  4. Watch your snapshot space growth.  Because snapshots live on your disk and grow over time, you want to be careful that you have enough spare capacity on your disk. Every snapshot consists of a “REDO” file; for the most recent snapshot, all new disk writes associated with the VM are recorded to this file. A REDO file has the potential in the extreme to grow to be the size of the original disk, and the REDO file of every snapshot that you maintain continues to occupy disk space. You want to make sure that you have enough "headroom" on your datastore to handle such growth over time.  Operations that might dramatically increase the size of your snapshots include the following: an OS service pack update, application reinstall, or a disk defrag inside the VM.

  5. Make sure the SQL Server Agent is up and running on the VirtualCenter DB. VirtualCenter depends on Microsoft SQL Server Agent to perform stats rollups. However, VirtualCenter does not have the ability to ensure this service is running on the DB server. If the user has it disabled, or the service is shut down at some point, the VI Client will not show expected stats (weekly, monthly…).  In addition, since daily data is not rolled up, it accumulates in the database, thus degrading performance and consuming more and more space.

  6. Team your management NICs if using VMware High Availability (VMware HA). This will help you avoid false alarms (i.e. false VMware HA failovers of VM's) in situations when you temporarily lose connectivity between your ESX hosts (e.g. when there's a momentary network outage, or even during a network switch maintenance operation).

Part 2 coming next week. [Update: see Top Tips for Deploying VI, Part 2]

--The VI Team


July 11, 2008

VMware is Storage Protocol Agnostic

Which storage protocol to choose?

The most common storage related questions we are being asked today are:

  • What is the best choice for running VI3 on shared storage?
  • Should we use Fibre Channel (FC), iSCSI or NFS?

The answer to these questions will depend on a number of variables and as such the same answer will not be the same for each environment. VMware currently supports deployment of VI3 on all three of those storage protocol choices, as well as on local ESX server storage, and is focused on enabling customers to be successful at leveraging the benefits each of those choices available for the virtualization environment. Although differences exist in which VMware features and functions are available on them, the current approach is to remove as many of those differences as possible so that customers can have more choices available to them.

There are many industry perceptions that exist when comparing FC to Ethernet based storage (iSCSI and NFS) and these generally apply to both the physical and virtual deployments. Even though virtualization does not resolve the differences which exist between these protocols, VMware is focused on providing as level playing field as possible when it comes to using the different storage protocols.

What is supported?

Today some VMware features, functions and products are available on FC but are not an option when using NFS or iSCSI.

Summary of current feature support across protocols:

Type

Boot VM

Boot ESX Server

VMotion

HA/DRS

RDM

MSCS Cluster

VCB

SRM

SVM

Lab

Mgr

Stage

Mgr

Life

Cycle

Mgr

Fibre Channel

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

Yes

iSCSI

Yes

Yes*

Yes

Yes

No

Yes

Yes

Yes

Yes

Yes

Yes

NAS

Yes

No

Yes

No

No

Yes

No

Not

Yet

Yes

Yes

Yes

Local Storage

Yes

Yes

No

Yes

No

No

No

Not

Yet

Yes

Yes

Yes

*  Boot from hardware iSCSI only (not supported from software iSCSI)

VMware is working to removing as many of those differences as possible so that the selection criterion for choosing a storage protocol is based on the differences that apply to both the physical and virtual world instead of what VMware features and products are available on one protocol but not another.

VMware will also address these differences in an order determined by the proportion of our customer deployments on each storage protocols. We have a great deal of work to do to complete this vision. But that is were we are headed.

Are there performance differences?

Performance is one of many considerations when choosing a storage protocol for the virtualization environment. However performance in a VI3 deployment is a multi­dimensional measurement that varies based on many factors. The number of ESX serves in the VMware cluster, number of VMs sharing a common pool of storage, block size, random vs. sequential ratio and read vs. write ratio. Today most benchmarks and comparisons tend to measure only one dimension and do not represent a typical multi ESX server infrastructure with multiple VMs running on them. The performance difference found in the physical world when comparing protocols is consistent with what is found in the world of virtualization. A VMware paper which provides some more details on storage protocol performance comparison is posted on our website. This paper shows that all network storage connection options available to ESX Server are all capable of reaching a level of performance limited only by the media and storage devices. When compared in terms of CPU costs, Fibre Channel and hardware iSCSI are the most CPU efficient because they offload the processing to the HBA, but in cases in which CPU consumption is not a concern, software iSCSI and NFS can also be part of a high performance solution.

Conclusions

One needs to consider not which protocol is the best choice for deployment of virtualization, but instead which virtualization solution provides the best support for multiple protocols? VMware not only supports the use of many storage protocol choices, but also provides a means to move VMs from a datastore on one protocol to a datastore on a different protocol while the VM remains up and running. Storage VMotion provides the means by which if one selects one protocol for a VI3 deployment, we enable switching to another protocol with much less disruption than any other virtualization solution offering available today.

While Storage VMotion is fully supported with FC today, we are headed in the direction of having it supported across all VI3 storage connectivity options (FC, iSCSI, NAS and local storage) so customers can seamlessly move from one type of storage to another.

So to close out on the question of which protocol is the best choice, the answer is all of them that VMware VI3 supports. And if you change your mind, or the choice first made was not the right one, VMware is providing the ability for one to move virtual machines to another storage protocol without huge hassle. In short, Vmware is all about being storage protocol agnostic.

If there is something you think I need to cover, let me know, please email me at: pmanning@vmware.com


June 24, 2008

Storage VMotion and 10Gb Ethernet support for iSCSI SAN's

What is the new news?

In VMware Infrastructure version 3.5 we introduced Storage VMotion, which does a live migration of virtual machine disk files from one storage location to another without any disruption or downtime to virtual machines and applications. Although Storage VMotion is designed to work with any type of storage, it was initially supported only with Fibre Channel SANs. As of Update 1, Storage VMotion is supported with iSCSI SAN’s for moving virtual machine disk files in the following scenarios:

- From iSCSI SANs to other iSCSI SANs

- From iSCSI SANs to FibreChannel SANs

- From FibreChannel SANs to iSCSI SANs
 
In addition, we now support the use of 10Gb Ethernet for iSCSI in a VMware Infrastructure environment.

What does this mean for users?

It means that you have the ability to optimize and update your storage environment whenever you need, without downtime to your applications. You can make sure that your virtual machines are always on the storage that meets your needs—regardless of how you first deploy your virtual machines, you can change the configuration of your storage and the type of storage (whether FibreChannel or iSCSI) as you need without disrupting your running virtual machines and applications. With Storage VMotion you can do any of the following without downtime:

  • Move virtual machines to a new array when you upgrade or refresh an array
  • Move virtual machine disks to a different tier of storage as their needs change
  • Troubleshoot and solve performance problems caused by storage misconfiguration or too      much load on a particular LUN 

Adding support for 10GB Ethernet for iSCSI means that you can continue to ensure that your iSCSI SAN deployment meets your performance requirements even as your virtual environment continues to scale up and your demands for storage networking bandwidth continue to grow.


June 17, 2008

Networking for VI admins

The world of networking is often unfamiliar territory for server and VMware Infrastructure (VI) admins. In most enterprises, networking is the responsibility of a dedicated team of networking experts who may be unfamiliar with the world of server virtualization. So how does a VI admin approach networking? How does he or she say to the networking folks, “I want to roll out VI across these hosts and I need them all connected to the production network.”

First up, involve the network team early and provide them with some information on how VI networking works. A good place to start is the VMware Virtual Networking Concepts paper at http://www.vmware.com/resources/techresources/997 This paper covers the essentials of VI networking: what is a vSwitch? How do I use VLANs? What are my options for NIC teaming, and so on? A network admin will find the concepts presented quite familiar. vSwitches (virtual switches) are very similar to L2 physical switches. Having the network folks read this paper will help put you all on the same page.

Next, talk about the application requirements. In almost all cases, you will want to separate all the traffic by type. 802.1Q VLAN tagging to the vSwitch (Virtual Switch Tagging) is the most common and recommended practice for scaling and separating traffic with restriction from the number of physical NICs. Management/Service Console (SC), VMotion, iSCSI, NFS, and VM traffic in most cases should be separated through the use of VLANs and appropriate NIC teaming policies. The network folks will look after and allocate the VLAN numbering (and may talk about avoiding VLAN 1 and use of native VLANs, etc).

NIC teaming gives you the two-fold benefit of maximizing the use of the host NICs while protecting against NIC, switch of link failures. The most common (and best practice method for guest VMs) is to select “originating virtual port-id” in the port group definition. This spreads the load over the available links in the NIC team—each guest VM hashes to one of the physical NICs (vmnics). Should a link, NIC or switch failure occur, traffic is forwarded over the remaining links in the team. Spreading the links over two physical switches protects the host against network failure, thereby maximizing availability. All that is required from a network admin standpoint is L2 continuity between the switches.

But, what about spanning tree protocol (STP), I hear someone say? Doesn’t this create a loop? vSwitches do not participate in STP. They don’t consume, forward or produce BPDUs. The other thing is, any frame received on a vmnic uplink is never forwarded out another uplink. This means we can connect our uplinks to different switches without causing any loops and avoiding the wasteful STP property of causing links to block (to avoid loops). You do not need to disable STP on the adjacent physical switch ports, but enable trunkfast or portfast to bypass the learning and listening phases of STP upon the link becoming active.

In summary, the concepts aren’t that complex, but you do need to understand how the pieces fit together.

There are many other networking topics to cover and we have quite a rich list of content coming out to help you with understanding and deploying VI networking. Look out for the VI Networking blog at blogs.vmware.com/networking starting this week.

If there is something you think I need to cover, let me know. Email me at guy@vmware.com


April 16, 2008

VMware ESX 3.5 Update 1 and VMware VirtualCenter 2.5 Update 1

Last week, VMware released an update to its VMware Infrastructure suite of products and there a few changes worth noting:

-          Enhanced hardware monitoring for VMware ESXi

-          Support for VMware High Availability (HA) for VMware ESXi

-          Support for Microsoft Cluster Services (MSCS) for both VMware ESX and VMware ESXi, including support for Cluster Continuous Replication (CCR) with Exchange 2007

The enhanced monitoring and alerting for VMware ESXi includes new Common Information Model (CIM) providers for CPU, system memory, fan, power supply, and other sensors.  This hardware health information can be consumed by VMware VirtualCenter and by third-party management products.

This further addresses a common question about VMware ESXi: “How can I manage ESXi without a Service Console?”  The answer: standards-based protocols and open interfaces.  Migrating to these remote management tools enables us to completely free the hypervisor from the operating system, leaving behind a small, 32 MB kernel.  This eliminates common security vulnerabilities found in general purpose operating systems while making the hypervisor extremely easy to deploy.

These remote management tools include:

·         VMware VirtualCenter for centralized management of VMware ESXi and its virtual machines.

·         CIM for key hardware health status information.

·         The Remote Command Line Interface (RCLI) for developing custom management scripts and GUI-free administration.

·         The VMware Infrastructure API for third party management integration, replacing the function of Service Console agents.  These tools use the same interfaces as VirtualCenter to monitor and interact with VMware ESXi.

·         VMware Consolidated Backup (VCB) for integrating with third party backup agents to provide LAN-free backup from a centralized proxy server.

While some of these tools, like VirtualCenter and VCB, are optional purchases, we do find that most customers do deploy them when managing multiple hosts in their environment.

Separately, with support for both VMware HA and Microsoft Cluster Services (MSCS), customers can deploy complementary tools for providing resilience in the face of hardware and application failures.  VMware HA provides a simple, application-independent solution for recovering quickly from server failures.  MSCS complements VMware HA with an application specific solution that provides continuous availability in the case of application or server failure.  The latest ESX update provides support for MSCS, including both 32-bit and 64-bit Windows guests, boot from SAN for VMs using MSCS, and Exchange 2007 Cluster Continuous Replication (CCR).  According to Microsoft, CCR “is a high availability feature of Microsoft Exchange Server 2007 that combines the asynchronous log shipping and replay technology built into Exchange 2007 with the failover and management features provided by the Cluster service.”  CCR support provides customers another availability tool for virtualizing their Exchange 2007 deployments.

While this is just a minor update to VMware Infrastructure, these small improvements provide further support for building a robust and resilient virtual infrastructure.


March 07, 2008

VMware virtualization perspectives

Hi VMware Infrastructure Users and Friends! Welcome to the VMware Infrastructure Team Blog. This blog will be a bi-monthly posting from the VMware datacenter virtualization product team. Here’s where you get to hear about the latest on our products as well as new ideas/concepts from VMware.

Today we begin with an overview of how we think about the entire virtualization stack. The clarification of our viewpoint is necessary because the market tends to speak of two categories – 1. The hypervisor and 2. An amorphous glob called “management”. While it is relatively clear what the hypervisor is, it is far from clear what is being lumped under “management” in the virtualization context.

In our minds, there are two distinct layers of “management” on top of the hypervisor – the virtual infrastructure capabilities, and the automation capabilities.

But let’s not get ahead of ourselves, and “peel the onion” layer by layer.

The hypervisor:

One question that we get asked all the time is “So, you sell ESX Server, right?” Actually, ESX Server is just one of the components in our suite for virtualizing the datacenter. It installs on each physical server and partitions it in multiple VMs. It’s the best hypervisor on the market, it is rock solid in its reliability and its ability to manage physical system resources amongst many many virtual machines – but that is not all you need for a complete solution.

Virtual Infrastructure Capabilities:

The VMware virtualization platform provides many systems infrastructure services such as high availability, data protection and security that were previously provided in the hardware, OS or applications layers. We provide these services in a consistent, uniform way across the datacenter, independent of the application, OS or hardware, making us a “great standardizer” for the datacenter.

We call this layer of distributed systems infrastructure services provided in the virtualization platform, virtual infrastructure capabilities.

What makes a virtual infrastructure capability different from hypervisor functionality?

If a capability acts on a single physical server, it belongs to the hypervisor. Capabilities that span multiple physical servers are virtual infrastructure capabilities.

VMotion is an example of a virtual infrastructure capability. In the physical world, moving running applications from one physical server to another is practically impossible without downtime. Some sort of workaround is achieved by clustering servers together and failing over to the backup server in order to perform regularly scheduled maintenance activities. But this is complex, expensive and entirely too much trouble to take for a regular workload. In a VMware environment, moving running VMs from one server to another without downtime is a point and click operation!

VMware has range of these capabilities, from zero downtime mobility of workloads with VMotion and Storage VMotion to dynamic load balancing with Distributed Resource Scheduler and automated failover of virtual machines in the case of physical hardware failure with HA. These virtual infrastructure capabilities are what make your virtual datacenter come closer to a real time infrastructure that is resilient to downtime and capable of optimizing and protecting itself.

What we call VMware Infrastructure (VI) is the aggregation of these virtual infrastructure capabilities with our hypervisor. VMware Infrastructure is centrally managed and its capabilities are administered through VirtualCenter, our one-stop shop for virtual infrastructure management

Management & Automation:

We don’t stop at just creating a virtual infrastructure that is more reliable and solid while at the same time flexible, optimized and highly available. Once a datacenter is running this virtual infrastructure, IT management processes become easier to execute.

This is where we draw the line between the virtual infrastructure and the management and automation capabilities:

  • VI is about infrastructure services that are turned on at a click of a button, and once configured are executed automatically without much human intervention. For example, if you turn on HA, it will automatically restart VMs without waiting on an admin to do anything.

  • Automation capabilities are about stringing along an entire workflow of multiple steps that that do involve human intervention – for approvals, determining service level agreements etc.

And this is accomplished with our automation products that bring reliability, repeatability, consistency and control to IT processes.

These automation products are quite unique to VMware and they create the end to end control and predictability in virtual infrastructure environments. We’re very excited to have launched two of these recently at VMworld Europe!

So, to recap – here’s what our product portfolio encompasses:

Virtlayers_2

See how our products fit into the above at http://www.vmware.com/products/server_virtualization.html.