Home > Blogs > VMware vSphere Blog > Monthly Archives: May 2010

Monthly Archives: May 2010

Can you help us improve SRM?

Hello everyone,

We are interested very much in learning about how you use SRM.  This information will help us understand better how our customers are using SRM and help us make decisions about how to improve SRM in ways that customers find useful.  So if you would like to help us by answering a short survey, we would most certainly appreciate it.  For the first 1000 people answering the survey we will make a donation to charity.

SRM Customer Survey

Remember, only customers who have used SRM should answer the survey!

Thanks very much!

Michael

Hardware Health Monitoring via CIM, part 3

In our last post, we looked at some SMBIOS based asset information.  In this post we'll start to explore what information is available from IPMI.

As with our previous posts, lets explore the relevant technology a little before we dive into code.  IPMI – the Intelligent Platform Management Interface, is a specification that defines how you can monitor and manage the health and hardware assets on a system using what is called a Baseboard Management Controller or BMC.  The BMC is a secondary processor/controller that runs in the system and typically has it's own power tap on the power supply, so it's live even when the main system is powered off (as long as the power supply is plugged in.)  Most modern server class x86 systems come with a BMC that supports IPMI.  The BMC is then hooked up to various sensors and components on the system and can monitor their state independent of the primary CPU and running operating system.  Some examples of the types of things BMCs may monitor are chassis intrusion sensors, temperature sensors, presence sensors for devices like hot-plug disk drives, and power supplies. Often the BMC is available on the network through a secondary IP address and can be used for remote management of the system, even if it is powered off (say to power it on – a capability which vCenter's Distributed Power Management feature leverages.)  We're not going to focus on remote IPMI in this post, but instead look at the types of information you can retrieve from the BMC via the CIM implementation within ESX.  The IPMI specification provides a lot of flexibility to system manufacturers in the type and number of sensors they implement.  As a result, the data you see on one system may not match the data you see on another, particularly if those systems are from different manufacturers.

Continue reading

Changed Block Tracking mismatch?

I came across this blog
last week regarding vSphere's changed blocked tracking (CBT)
capability and was a bit concerned since it explicitly called out
VMware Data Recovery (VDR) in the list of products affected.   As
background, CBT is used by VMware products/capabilities like VDR and Storage vMotion to efficiently
determine the amount of changed disk blocks.  I did some investigation
on the topic to ascertain the extent of the problem – while I do agree that it exists, I am not convinced how
often customers will see it in their production environment. 

In summary, you have to have the following ingredients:

  • Using VDR (or any backup product that leverages CBT)
  • Backing up a HW7 virtual machine

And then you have to do the following steps in sequence:

  1. Perform a backup of the VM using VDR
  2. Perform a manual snapshot of the VM
  3. Perform another backup of the VM using VDR
  4. Revert the manual snapshot
  5. Perform another backup of the VM using VDR

What could end up happening is the snapshot revert in Step 4 causes the CBT change
ID to decrement and become smaller, but CBT's assumption that change IDs
will always increment and be a larger number.   This leads the incremental backup in Step 5 to not receive the correct list of changed blocks from CBT – the point that the blog was trying to convey. 

Can this be reproduced in a lab?  Absolutely.  

Does it happen frequently to matter to customers?  To really see this occur in the real world, there
generally needs to be a few mini-steps between steps 4 and 5.  Things
like more snapshot creates or deletes, multiple VM power off and on or
VMotion operations.  These operations will generally decrement the CBT
change ID – and increases the likelyhood of the mismatch to occur.

Should you be concerned?
 You should at least be informed!   Some backup products already have
workarounds implemented so the possibility of running into this
scenario is eliminated.  The VMware team also created a KB
article that summarizes the issues and potential workarounds.  As
standard VMware practice, the KB will be updated as needed in the
future with more information.

vSphere loves 10GigE

It didn’t seem so long ago that 10GigE was just “one of those things we’ll look at soon”. A show of hands at our recent internal TechSummit conference suggests almost every customer is either implementing 10GigE in production or kicking the 10GigE tires in their pre-production labs.

Why 10GigE?

10GigE has a couple of obvious advantages over 1GigE:

  • Bandwidth—it’s 10x the bandwidth. To ensure packets from a particular flow are not reordered, all the teaming polices hash to one vmnic (physical NIC) within the team between a single source and destination. 10GigE provides more bandwidth for any traffic type or flow that would have been restricted to 1GigE. e.g. NFS, iSCSI, FT logging, vMotion, and individual VMs.
  • Management—2x 10GigE links has to be easier to manage and deploy than 6, 8, 10, or more 1GigE links

2x 10GigE as the typical Deployment Scenario

At this point in time, almost all 10GigE deployments will use two 10GigE interfaces linking to a pair of physical switches (top-of-rack or end-of row) with L2 continuity over all access VLANs between the two switches (so you’re not exposed to single switch/linecard failure and can failover to the other switch). 

Converged Traffic?

In the not too distant past, VMware had guided customers to dedicate vmnics to each of the various traffic types. In the world of 10GigE, there is no need to continue with this methodology. VLANs provide logical separation and 10GigE interfaces provide sufficient bandwidth and a better performing way to handle the vmkernel and VM traffic loads from ESX and ESXi hosts.  

Switches

If you want to use teaming and have some protection against single points of failure then both NICs must be on a single vswitch (vSS, vDS, or Nexus 1000V). If you’re using VLANs for traffic separation (of course you are!), there really is no need for multiple vswitches anyway.

Traffic types and Teaming Policies

There is no one right way to deploying 10GigE—it will depend upon your environment. Some things to consider:

  • IP Storage—NFS and/or iSCSI—are you using these? How much bandwidth do you need and will it consume if given 10GigE?
  • vMotion—a single vMotion can consume ~3.6Gbps with a maximum of two running concurrently.
  • Service console or management interface—it doesn’t use much bandwidth at all, but must be available.
  • FT logging—requires a lot of bandwidth and low latency (10GigE helps FT a lot) as it replicates the read I/O traffic and ingress data traffic to the secondary FT VM. In the current implementation, FT can consume up to ~4Gbps but can consume much less if the FT workloads are low. 
  • VM traffic—how much of it do you have? Are they particularly bursty or heavy consumers of bandwidth? Note that in in a 1GigE environment, each VM (assuming single vnic) was capped at 1GigE ingress/egress.

You could just apply “Originating Virtual Port ID” to the teaming policy on all Port Groups and dvPort Groups and it would work just fine. But, I prefer more deterministic control over the traffic flows. I like the method shown in the diagram below. This is applicable to vSS and vDS switches. The details are as follows:

  • VST (Virtual Switch Trunking) mode—trunk the required VLANs into the ESX/ESXi hosts over both 10GigE interfaces and ensure there is L2 continuity between eth two switches on each of those VLANs.
  • VM portgroups (or dvPortgroups)—active on one vmnic and standby on the other (vmnic0/vmnic1 in my example)
  • vmkernel portgroups (or dvPortgroups)—active on one vmnic and standby on the other in reverse to that for the VMs (i.e. vmnic1/vmnic0 in my example)

With both NICs active, this means that all VM traffic will use vmnic0, and all the vmkernel ports will use vmnic1. If there is a failure, then all traffic will converge onto the remaining vmnic. (note that when using a vDS, dvPortgroup teaming policies apply to the dvUplinks, which then map to the vmnics on each host)

Traffic Shaping?

As an additional option, you can employ the traffic shaper to control or limit the traffic on any one port. Moving all vmkernel ports to one vmnic means you can control and apply this more effectively. You only need to use this if you are concerned about one traffic type dominating others. Since vMotion, management traffic, and FT logging are effectively capped, this really only concerns iSCSI and NFS. So you may wish to apply the shaper to one or other of these to limit its effect on vMotion, FT and management. 

The traffic shaper is configured on the port group (or dvPortgroup). On the vSS, the shaper only applies to ingress traffic (relative to vswitch from VM or vmkernel port). In other words, it works in the southbound direction in my diagram. The vDS has bi-directional traffic shaping. You only need to apply it on the ingress (southbound) side.

Try it in your lab. In most cases I doubt you would need it, but it’s there. Do the math on the competing traffic types to work out your average and maximum bandwidth levels. 4Gbps is probably a good start as an average for iSCSI or NFS (which is way more than the 1Gbps you had before).

Note that you should not configure an average or maximum traffic shaping limit above 4Gbps (2^32). It’s a 32-bit field, so will apply a 4G modulus to whatever value you enter. e.g. 5Gbps will end up as 1Gbps.  

 

 image  

What if I’m using the Nexus 1000V?

There is absolutely no issue in using 10GigE with the Cisco Nexus 1000V. It’s a good thing for all the same reasons I mention above. When using the Nexus 1000V with Nexus 5000 top-of-rack switches, you should consider using vPC mode. This logically aggregates the n5k switches. If desired, you can also apply traffic shaping to individual ports via the port profile configuration.

Additional Guidance

We are preparing some documents that expand on what is outlined in this blog entry. We will go into a bit more detail around the configuration of the virtual and physical switches. You can expect to see these in the next few weeks.

Summary

My summary is quite short. If you have 10GigE, go ahead and use it, and enjoy the extra performance and management benefits.

Hardware Health Monitoring via CIM, part 2

In our last post, we looked at some of the basics of CIM technology and a very basic script that can dump out the version of ESX the system is running.  Now lets take a deeper dive into some of the hardware asset information that's available.

Before we look at the code, lets talk a little about where this data comes from.  The data we'll be looking at in this installment comes from SMBIOS, which is an industry standard extension to BIOS that exposes basic system management information.  Just about every system you can buy these days has some amount of SMBIOS information available.  Some examples of SMBIOS information are CPU details, memory modules, and chassis information.  This information maps into a number of SMASH profiles, such as DSP1022 – CPU profile, DSP1026 – System Memory Profile, and DSP1011 – Physical Asset Profile.

Continue reading

Practice Makes Perfect!

Hello everyone,

I read an interesting article today at Chuck's Blog which hit a great note with me.  This ineresting article was about a company that had an outage with their email server, that was hosted on an EMC product.  Some of what Chuck talked about was the big picture and how vendors will never be able to protect customers completely.  Products will always improve, but customers will always need to do a part themselves.  The part of Chucks blog that I want to share with you is this quote he had:

I heard one very experienced operations manager put it this way — "it
ain't protected unless you can prove it, and the only way you can prove
it is to do it". 

And of course this is truth like none other in the DR world.  I learned in the military life of my youth about how practice makes perfect.  Traditional DR is a lot of scripts, checklists, and it is hard to practice.  Sometimes you can practice parts of it, but often you cannot practice all of it.  So no practice makes no perfection, and no truth. 

Our great product, Site Recovery Manager, will allow you to practice.  It will support you practiceing as hard and as often as you want.  It will help you test as realistic as you want, and of course, this all means you can prove your DR plans work.  And that is key.

Michael