Home > Blogs > VMware vSphere Blog > Monthly Archives: February 2011

Monthly Archives: February 2011

Fault Tolerance and Isolation Response

Most of us are familiar with how the HA Isolation Response works.  If a host in a cluster loses connectivity with all the other nodes in the cluster it is deemed as isolated.  When this happens the cluster “Isolation Response” dictates how the host will react. 

The options are:
(1)    Leave powered on
(2)    Power off
(3)    Shut down

The “Leave powered on” option is there to protect against a false positive (meaning the host thinks it’s isolated when it’s really not – typically occurs in response to a network problem outside of vSphere, or when there is insufficient network redundancy).  The “Power off” and “Shut down” options will halt the VMs releasing the VMFS disk locks enabling the non-isolated nodes in the cluster to restart the VMs.  The difference being with shut down an attempt is made to do a graceful shutdown from inside the Guest OS, where power off does not try to shutdown the OS and will just power off the VM.

However, something that many of us probably aren’t aware is that the isolation response doesn’t apply to FT protected VMs.  When FT is enabled on a VM it gets excluded from HA actions.  As such if a host that is running an FT primary gets isolated, the FT protected VM will continue to run on the isolated host regardless of the cluster’s isolation response.

This is an important point to remember when running FT in your HA/DRS clusters.  To avoid a situation where a FT primary VM may get stuck on an isolated host it’s important to have adequate network redundancy for both your management and logging networks.

Regards,
-Kyle

Ops changes part 4 – Injecting or installing drivers

Two weeks ago I was presented with a problem regarding the use of drivers that were not part of the standard ESXi ISO. I started looking into it and discovered there are two main solutions for a problem like this.

  1. Installing drivers during the installation
  2. Injecting drivers into the ISO

Both solutions have it's advantage and disadvantage but I guess the main argument why I prefer to install drivers during the installation versus injecting them into the ISO is flexibility and support. When a new driver is released I want to be able to change the driver bundle instantly without the need to repackage the ISO and I want the installation mechanism to be fully supported. However there are always corner cases where it might be required to inject the driver into the ISO. In this article we will describe both methods. I have used the following articles and documents to create this article:

Installing drivers during the installation

The steps required to install drivers during a scripted installation are straight forward. First the driver package needs to be downloaded and preferably hosted on an internal website. I would recommend using the same location as were the ks.cfg is stored for your scripted install. (If http based of course.)

With ESXi there are two sections where you can define post installation actions. These sections are %post and %firstboot. The difference between these two is that all actions defined in %post will run before the Installer reboots the host where all actions defined in %firstboot will run during the first boot. In the case of %firstboot this is done by a script located in /etc/vmware/init/init.d/ with a prefix of 999, which means it is the last script to be executed during boot.

Depending on the driver that needs to be installed you will either need to use the %post or the %firstboot section. The %post section is mostly used in the case where a network driver needs to be downloaded before the restart, see my article on yellow-bricks.com for an example. In most cases however placing all commands in %firstboot should suffice. Our example will deal with the most common scenario and that is the download and installation of a driver in the %firstboot section.

Edit the install script and include the following in the %firstboot section where <ip-address> needs to be replaced with the ip-address of the web server and <driverbundle> with the name of the actual bundle:

vim-cmd hostsvc/maintenance_mode_enter
esxupdate --bundle=http://<ip-address-server>/<driverbundle>.zip update
vim-cmd hostsvc/maintenance_mode_exit

This script will place the host in maintenance mode install the bundle and exit maintenance mode again. This is a very simple and flexible method to install drivers during the installation. Please note that it might be necessary to reboot the server after the installation of the patches. This can of course also be scripted, just at a "reboot" at the end of the %firstboot section if required.

Inject drivers into the ISO

The second options which enables you to use drivers that are not part of the standard ISO is to inject them into the ISO. This is a bit more complex and a less flexible solution as every update to the drivers will require the ISO to be repackaged. Also note that VMware currently does not support the injection of drivers into the ISO and this procedure is provided "as is" without warranty/guarantee.

The following tools are used to alter the ISO:

  1. Auto Deploy appliance
  2. VMware Workstation (VMware Player or ESXi will also work)

The first step is to import the Auto Deploy appliance into VMware Workstation. When it has been imported you will need to power it on and make the ESXi ISO available. (If the power-on fails with a kernel panic you will need to change the SCSI device to "lsilogic" and the OS to "Redhat 5 – 64Bit".) When the ISO is available the following steps will need to be taken to inject a driver:

  1. Mount the ISO image
    sudo mount -o loop VMware-VMvisor-Installer-4.1.0.update1-348481.x86_64.iso /mnt/iso/
  2. Create a new folder which will be used for the content of the mounted IS0
    mkdir /tmp/custom-iso
  3. Copy the content of the ISO to the folder
    cp -R /mnt/iso/* /tmp/custom-iso/
  4. Go to the tmp folder
    cd /tmp
  5. Unzup the image to enable the injection of a bundle
    sudo bunzip2 -c /tmp/custom-iso/imagedd.bz2 > imagedd
  6. Inject the bundle (-o) or a driver (-v) into the image, for the exercise we will insert a single .vib file, please not that <filename.vib> will need to replaced with the actual filename
    sudo vibddi -i imagedd -v <filename.vib> -n
  7. Validate that the bundle has been injected
    sudo vibddi -i imagedd -q
  8. Zip the image
    bzip2 imagedd
  9. Create a new checksum
    md5sum imagedd.bz2
  10. Copy the checksum and replace the checksum in the following file using your favorite editor (I used "vi")
    /tmp/custom-iso/imagedd.md5
  11. Copy the file to the custom ISO folder (might need to "sudo" depending on the permissions)
    cp imagedd.bz2 /tmp/custom-iso/
  12. Go to the custom ISO folder
    cd /tmp/custom-iso
  13. Recreate the ESXi ISO
    mkisofs -l -J -R -r -T -input-charset UTF-8 -o /tmp/VMware-Custom-VMvisor-Installer-4.1.0.update1-348481.x86_64.iso -b isolinux.bin -c boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table /tmp/custom-iso

Now the ISO should be ready for use! Again, please note that VMware does not support the "vibddi" injection mechanism. However, I decided to document this procedure as in some cases it is very difficult to install ESXi without having the drivers packaged inside the ISO and I wanted to show that it is possible.

Hopefully this will ease your migration to ESXi.

vMotion – what’s going on under the covers?

It’s funny how you can go months without any questions about a feature and then all the sudden you get a flurry of questions.  Such is the case with vMotion.  Over the past few weeks I’ve had several folks ask me about vMotion.  These haven’t been the typical “how do I configure vMotion” questions, but rather “what’s going on under the covers” type questions.  In response to one such question Gabriel Tarasuk-Levin from the VMware engineering team gave an excellent overview on how vMotion works and I thought it would be good to share.

Paraphrasing from his email:

vMotion steps, at a high level (vSphere 4.1):

1.    Shadow VM created on the destination host.
2.    Copy each memory page from the source to the destination via the vMotion network.  This is known as preCopy.
3.    Perform another pass over the VM’s memory, copying any pages that changed during the last preCopy iteration.
4.    Continue this iterative memory copying until no changed pages (outstanding to be-copied pages) remain.
5.    Stun the VM on the source and resume it on the destination.

In most circumstances the iterative copy works very well, assuming the host is able to transmit memory pages over the vMotion network faster than the VM can dirty new pages.

However, in the rare event that the VM is dirtying memory pages faster than vMotion can send them it is possible to get into as situation where the preCopy won’t be able to converge.
 
When the preCopy cannot converge, vMotion needs to decide whether to fail the vMotion or to proceed with switchover to the destination anyway.  It makes this decision by estimating the time required to transmit all the remaining outstanding pages.  By default, if this time is below 100 seconds vMotion will proceed with the switchover.  If it will take more than 100 seconds the vMotion will fail  (timeout) with no impact on the VM.
 
In the event the VM passes the 100 second check, VMotion will stun the source and start running on the destination.  While the destination runs, the source will transmit the remaining pages to the destination using the “quick resume” capability introduced with vSphere 4.1. 

A few key takeaways I’d like to emphasize: 

  • You should be able to vMotion any workload as long as it is dirtying memory pages at a rate that is less than your vMotion network transmit rate.
  • vMotion will only transfer the VM to the destination if it is certain that it can complete the memory copy.
  • If vMotion cannot complete the memory copy it will fail with no impact to the running VM.

Also, when it comes to troubleshooting vMotion a good place to start is by performing a few vMotions on the host and consulting the vmkernel logs to find your approximate network throughput.  Then look at the VMs memory change rate to identify if the vMotion network throughput is sufficient.

 Regards,
-Kyle

Ops changes part 3 – Local disk vs USB vs BFS

When I was planning this series of articles one of the things that was on the top of my list was to "compare" the type of install destinations that could be used as I guess the variety of options is the strength of ESXi. The reason, as hopefully all of you know by now, that ESXi can be installed on a large range of different devices is the size of the system image.The system image is not only tiny, but ESXi is loaded into memory and will only write its configuration roughly once every 10 minutes when it is booted.

Today I want to discuss the variety of options which I have combined in three major categories which I have listed the possibilities below:

  • Local disk (including SSD)
  • Removable Media
    • USB
    • SD
  • Boot From San
    • FC
    • iSCSI

As you can see there is a large variety of options which of course leaves you with a decision to make. We cannot make this decision for you but we can inform you about some of the challenges that you might face as some of the options have some technical restrictions of for instance restrictions from a support perspective.

Local Disk

Local disk is one of the options, together with removable devices, that is the most used. Local disk installations provide two huge advantages over removable devices; those are resiliency and the level of automation.

Resiliency refers to the ability to run two local disks in RAID-1. As mentioned, although ESXi is loaded into memory it will need to write it's configuration once every 10 minutes and of course in the case of a single disk failure a reboot can still be successfully completed without the need to immediately replace a component. On top of that Local Disk enables you to implement a scripted installation mechanism. This is also supported for Boot From SAN (iSCSI/FC) but currently not supported for removable devices like USB and SD media.

Requirements:

  • Minimum disk space required is 5GB

Constraints/Impact:

  • Expensive compared to removable devices like USB

Removable Devices

Removable devices like USB and SD have always been one of the favorite ESXi install destinations for many. The main reason for it being is the "flexibility" and of course the costs associated with a USB/SD device. Flexibility meaning that usually USB/SD devices are easy to swap when an issue arises. As mentioned in the Local Disk section a removable device offers less resiliency, however some hardware vendors have found a solution for that by offering a so called "Dual SD Module". (Dell in this case.)

From an operational perspective the flexibility, and resiliency depending on the hardware vendor, seems to offer a lot of benefits. One thing to note, and this is something that many don't realize, is that it is currently not supported to do a scripted install of ESXi on a removable device. Of course it is fairly easy to convert your kickstart script into a PowerCLI script or combine it with Host Profiles (if your license permits) but it still should be noted.

Requirements (src kb):

  • VMware supports Removable Devices only under these conditions: 
    • The server on which you want to install ESXi 4.x is on the ESXi 4.x Hardware Compatibility Guide.

      AND

    • You have purchased a server with ESXi 4.x Embedded on the server from a certified vendor.

      OR

    • You have used a USB or SD flash device that is approved by the server vendor for the particular server model on which you want to install ESXi 4.x on a USB or SD flash storage device.

Constraints/Impact:

  • Scripted installations of ESXi on removable devices is not supported

I hope this should make it crystal clear that in the case of USB/SD devices it is really important to ensure that your environment is fully supported.

Boot From SAN

The last but definitely not the least option is Boot From SAN. Boot From SAN is an option that was added when 4.1 was released. Not only is FC supported but also iSCSI. Boot From SAN is kind of the best of both worlds, where in this case both worlds would be Local Disks and USB. Boot From SAN enables you to run ESXi on a disk-less server while still providing you with the option to do a scripted install and provide you with resiliency.

Requirement:

  • Support for Boot From SAN for storage device and adapters

Constraints/Impact:

  • Budget, SAN storage can be expensive compared to local disks / removable media

Conclusion

Every option has its benefits and depending on the type of environment all should be considered. Based on requirements and constraints around budget, licensing, array capabilities a decision will need to be made on a per case basis. Generally speaking using "local disks" is the most compelling option as it enables you to script your installation and is relatively cheap compared to Boot From SAN.

Cool PowerCLI script for backing up the ESXi System Image

Last week I received an email from Alan Renouf about backing up your ESXi System Image. Alan asked if I could review his article as he knew it is one of my focus areas. Today Alan published the article and I personally feel it is one of those golden nuggets again. In just 13 lines of code Alan manages to dump the configuration of your ESXi host to a .tgz file on the server you are running the script from. I guess Alan explains it best what the purpose is of this script:

So knowing this we can effectively back up the configuration of our ESXi hosts simply by backing up the state tardisk, then if the worst happens we can restore this and have our ESXi host with complete configuration restored in a matter of minute.

I think it is obvious that these few lines of code can be very powerful and can be used for disaster recovery scenarios. I guess the only thing left to say is that you should head over to Alan's article and try out the script.

Ops changes part 2 – Scripted installation

One of the first operational "obstacles" that will need to be concurred for the migration to ESXi is a scripted installation. I wrote an article with an sample script two weeks ago, which is a good start. I asked around on twitter  on what kind of things people would like to see and I also I received some questions via email. I decided to document these as well and am hoping it will make your migration to ESXi easier. Some of these are fairly obvious, others aren't.

I have validated all of these, and during validation I noticed that William Lam had already documented some of them, it is worth checking out William's tips! (Also check out this one on enabling the management interface for vmk0.)

How can I define a second DNS server? (This needs to be part of the first section of the script.)

network –bootproto=static –device=vmnic0 –ip=10.0.0.51 –netmask=255.255.255.0 –gateway=10.0.0.250 –hostname=esxi001.localdomain –vlanid=0 –nameserver="10.0.0.1,10.0.0.2" –addvmportgroup=0

Can I enable TSM by default?

vim-cmd hostsvc/enable_local_tsm
vim-cmd hostsvc/start_local_tsm

And what about SSH (Remote TSM)?

vim-cmd hostsvc/enable_remote_tsm
vim-cmd hostsvc/start_remote_tsm

Please note that we don't recommend setting these to enabled by default, but as it was a question I received twice this week I wanted to make sure it what covered.

What about NTP? It is very simple and similar to how it was done with ESX classic:

echo restrict default kod nomodify notrap noquerynopeer > /etc/ntp.conf
echo restrict 127.0.0.1 >> /etc/ntp.conf
echo server 10.0.0.11 >> /etc/ntp.conf
echo server 10.0.0.12 >> /etc/ntp.conf
echo driftfile /var/lib/ntp/drift >> /etc/ntp.conf
/sbin/chkconfig –level 345 ntpd on
/etc/init.d/ntpd stop
/etc/init.d/ntpd start

Can I change the vSwitch load balancing mechanism?

vim-cmd /hostsvc/net/vswitch_setpolicy –nicteaming-policy='loadbalance_ip' vSwitch0

The default nic teaming policy is "loadbalance_srcid" (originating Virtual Port ID) the other two options are "loadbalance_ip" (IP-Hash) and "loadbalance_srcmac" (source MAC hash).

Can I rename my local datastore so that it includes my host name? Definitely something I would recommend:

vim-cmd hostsvc/datastore/rename datastore1 $(hostname -s)-datastore01

I have an iSCSI storage array, can configure the server for it? (We will create two vmkernels and set only 1 vmnic as active and remove the other on the portgroup.)

esxcfg-vswitch -a vSwitch1
esxcfg-vswitch -L vmnic1 vSwitch1
esxcfg-vswitch -L vmnic2 vSwitch1
esxcfg-vswitch -A iSCSI-01 vSwitch1

esxcfg-vswitch -A iSCSI-02 vSwitch1
esxcfg-vswitch -v 20 -p iSCSI-01 vSwitch1
esxcfg-vswitch -v 20 -p iSCSI-02 vSwitch1
esxcfg-vmknic -a -i 192.168.20.11 -n 255.255.255.0 iSCSI-01
esxcfg-vmknic -a -i 192.168.20.21 -n 255.255.255.0 iSCSI-02

esxcfg-swiscsi –e
vim-cmd hostsvc/net/portgroup_set –nicorderpolicy-active=vmnic1 vSwitch1 iSCSI-01
vim-cmd hostsvc/net/portgroup_set –nicorderpolicy-active=vmnic2 vSwitch1 iSCSI-02
esxcli swiscsi nic add -n vmk1 -d vmhba33
esxcli swiscsi nic add -n vmk2 -d vmhba33
vim-cmd hostsvc/storage/iscsi_add_send_target vmhba33 192.168.1.51

esxcfg-rescan vmhba33

[Update] Gert pointed out that the "vmhba" can be random and he is right. To be absolutely certain you could use a variable for vmhba by using the following line and replacing vmhba33 by $varvmhba:

varvmhba="$(esxcfg-scsidevs -a | grep iscsi | awk '{print $1}')"

Can I place the log-files on a different datastore? (In this example the datastore is called "vmfs01" and it uses the "hostname" as part of the directory structure.)

vim-cmd hostsvc/advopt/update Syslog.Local.DatastorePath string "[vmfs01] /$(hostname -s)/logfiles/messages"

These are just a couple of examples of how you can beef up your installation script and configure ESXi in such a way that it is ready to be added to vCenter and can be used straight away!

Ops changes part 1 – Introduction

One of the thing that will probably most time consuming when migrating to ESXi are the changes in operational procedures that will be required. Of course this fully depends on what type of user you are today. I guess we can qualify the users in three main categories:

  1. vCenter addict
  2. PowerCLI god
  3. Bash guru

For both category 1 and 2 the changes in operational procedures will be kept to a bare minimum but for category 3, the bash guru, the changes will be substantial. This series of articles will take some examples of what you used to do with ESX classic and translate it to ESXi. The main focus here will be vMA and/or the Tech Support Mode console. We will also provide you with some additional examples for scripted installations to show you the possibilities with ESXi are endless, just like with ESX classic.

I want start with pointing everyone to some very valuable resources for ESXi, vMA, vCLI or PowerCLI goodness. I would like to recommend adding these to your rss reader or checking them regularly as it will come in handy at some point, I know I have visited these countless times.

ESXi

vMA / Perl / vCLI

PowerCLI

This was part 1 of 10, part 2 will discuss some more examples of things you could configure during a scripted installation as this is more than likely the first thing that will need to change when migrating from ESX classic to ESXi. Most of these are similar to ESX by the way, but I guess they are useful to know about anyway. If there are any specific things you would like to discuss don't hesitate to leave a comment so we can try to address these in one of the articles.

VMware Data Recovery is now included in Standard

Hello all,

This question has been raised a few times lately, and I thought I would try to give the answer a little broader distribution.

Yes, as of vSphere 4.1 Update 1, we have included VMware Data Recovery (vDR) with the Standard product edition, which means all of the editions now include vDR.  You can see more detail here.

What is not in that nice chart is that if your subscription is current, people that bought Standard before Update 1 also are able to download vDR and use it without additional purchase.  This is only true if they have maintained their subscription which is most generally true.

vDR is a great product and I am glad more of our customers get to use it now!

Michael

Recommended Alarms for SRM Admins to watch

Hello all,

I have been asked before about which of the SRM alarms should users configure and watch for.  There is a lot of different alarms, and I suspect no one needs all of them, but I also suspect everyone will need a few of them.  I will help you get started with what I think are important and mostly standard alarms.  I will also give you ideas of what will trigger them if there is any doubt. 

Lets dive in!

Email working

The first thing we need to do is to make sure that vCenter can send email.  This is done in the Administration \ vCenter Server Settings area.  Once there, select the Mail option.  See the screen below for what it should look like, and where to add in your SMTP server and the sender account.

Mailconfig

Note that if you are working in Linked Mode, like I am, you can change your vCenter server (from the drop down list at the top of the screen) so you can easily make the change to both sides.  This is important since you will need to configure SRM alarms on both sides. 

Once your vCenter can send email we are ready to check out the alarms.

SRM Alarms

You can find the alarms, in SRM, on the Alarms tab, which is next to the Summary tab.  See below for what it looks like.

Alarms

Remember that these alarms exist on both the protected, and recovery sites.  You need to think about where to configure alarms.  For example, if you configure the Remote Side Down alarm to email you when it triggers, and you configured it on the Protected side will it alert you when the Protected side is down?  No, it will only alert you when the recovery side is down!

What alarms are important?

Remote Site Down – this watches the remote side, which should be the protected site, and will fire when the SRM service stops running.  This is NOT really a good safe way to know if you need to trigger a DR event, but it is still good info.

Recovery Plan Destroyed – since you lose history when you delete a recovery plan, most users I know don't like to lose the recovery plans.

Recovery Plan Started / Recovery Plan Execute Test begin are good in the beginning but not really necessary later.

VM Added is an important one.  It means that a VM has just been copied to a LUN that is protected by SRM.  This means if someone who should know better, migrates a VM to a protected LUN you will be alerted on it!

VM Not Protected is also an important one.  It will fire if you add a VMDK to a VM and it is held on a non replicated LUN.  Or you attach a CD / ISO to the VM.  So a protected VM where something is done to it that makes it NOT protected.  Again, very good to know about.

Recovery Profile Prompt Display means that the recovery plan has stopped, and is waiting for you to to confirm something.  So good to know!

Recovery Plan Prompt Response – this is one that may not be as important but it will fire when a waiting prompt is acknowledged.

Protection VM Limit Exceeded – this will alert you when you are protecting more VM's than you have licensed.  Good to know!

Each alarm does have a description field if you need a little reminder of what they do.

The alarms above in bold are the particularly useful ones.

Below is an example of what configured alarms look like.

Suggestions

How do you configure alarms?

You double – click on the alarm you are interested in.  Than you change the to the Action tab.  You will have a choice of sending an email, sending a trap, or run a script.  See below for an example.

Create

That's it.  You now have alarms configured to send email when they fire.

Important Note: make sure to test any of these alarms that you configure for email before you use them in production.

Any questions or comments just let me know!

Michael

VMware vCenter Site Recovery Manager 4.1.1 is released!

 

Hello all,

This maintenance release of SRM was released the same day that vSphere 4.1 Update 1 was released.  We will always try to do that whenever possible.  There are a number of important fixes in it however, and as such I recommend people look at upgrading to it as quickly as their need dictates.  I should mention that the upgrade itself can only upgrade SRM 4.1 installs and NOT SRM 4.0 installs.  The release notes can be found here.  You can download the update here.

To minimize the upgrade outage, start with the production site.  Even if the upgrade doesn't work for some reason, or if you take too long, and a DR event occurs, you will still be able to fail over since that doesn't require the protected site to be running.  Both sides must be at the same version and build to work together so do not forget that.  Once you start your upgrade work until you are finished!  Remember to upgrade the SRM plug-ins too.

Questions or comments are always welcome!

Michael

Update: 2/16/11 – the release notes, and this blog, did not make clear that you do not need to upgrade vCenter Server to 4.1 Update 1 before applying this SRM 4.1.1 update.  Think of this SRM release more like a patch.

Update 2/17/11 – while I tested this, and the Compatibility guide confirmed you didn't need to upgrade vCenter first, it turns out you do.  The Compatibility guide will be updated shortly.  You must upgrade vCenter to 4.1 Update 1 before you upgrade SRM to 4.1.1.  Sorry for the confusion.