Home > Blogs > VMware vSphere Blog > Monthly Archives: July 2011

Monthly Archives: July 2011

vSphere 5.0 Storage Features Part 6 – Storage DRS – Balance On I/O Metrics

Another feature of Storage DRS is its abaility to balance VMs across datastores in the datastore cluster based on I/O metrics, specifically based on latency.

First, let us see how Storage DRS (SDRS) is able to capture this information.

SDRS uses Storage I/O Control (SIOC) to evaluate datastore capabilities & capture latency information regarding all the datastores in the datastore cluster. SIOC was first introduced in vSphere 4.1. Its purpose is to ensure that no single VM uses all the bandwidth of a particular datastore, and it modifies the queue depth to the datastores on each ESX to achieve this.

In SDRS, its implementation is different. SIOC (on behalf of SDRS) checks the capabilities of the datastores in a datastore cluster by injecting various I/O loads. Once this information is normalized, SDRS will have a good indication of the types of workloads that a datastore can handle. This information is used in initial placement and load balancing decisions.

SDRS continuously uses SIOC to monitor how long it takes an I/O to do a round trip – this is the latency. This information about the datastore is passed back to Storage DRS. If the latency value for a particular datastore is above the threshold value (default 15ms) for a significant percentage of time over an observation period (default 16 hours), SDRS will try to rebalance the VMs across the datastores in the datastore cluster so that the latency value returns below the threshold. This may involve a single or multiple Storage vMotion operations. In fact, even if SDRS is unable to bring the latency below the defined threshold value, it may still move VMs between datastores to balance the latency.

And since we now support Storage I/O Control on NFS in vSphere 5.0, we can also have NFS datastore clusters in SDRS.

If the datastore cluster is set to manual mode, SDRS will raise an alarm to bring to the administrators attention that a recommendation has been made. By looking at the SDRS tab, the administrator can then see the recommendations made by SDRS in order to balance the I/O load. An example of a recommendation is shown here:

Sdrs-7
What is very cool about the recommendation is that it gives the administration insight into what the latency measurements are on the source and destination datastores. The administrator can then refer to this information before deciding on whether to migrate the VM or not.

Storage DRS provides customers with a way of automatically load-balancing their datastores, avoiding hot-spots on your storage.

Setting up the ESXi Syslog Collector

In my last post I went over the steps to setup the ESXi dump collector.  I figured it would be good to follow-up with a quick post on setting up the syslog collector.  Syslog collector also addresses the issue of an Auto Deployed host not having a local disk.  With no local disk the log files are stored on a ramdisk, which means each time the server boots the logs are lost.   Not having persistent logs can complicate troubleshooting.  Use the syslog collector to capture the ESXi host’s log on a network server.

Just like with the dump collector the syslog collector is very easy to install and configure.  The syslog collector is bundled with the vCenter Server Appliance (VCSA) and requires no extra setup (by default the logs are stored in /var/log/remote/<hostname>).  To install the syslog collector on Windows simply load the vCenter installation media, launch autorun and from the main install menu choose “Syslog Collector”.

B-root-install

You can specify where to install the collector and where to store the logs:

B-syslog-setup

Pay attention to the port settings and make sure you open the required firewall ports:

B-syslog-ports

You can install the syslog collector on a standalone windows host or on your vCenter server:

B-vcenter-integration

Once the syslog collector has been installed the next step is to simply configure the ESXi hosts to use the server as its loghost:

~# esxcli system syslog config set –loghost=x.x.x.x

~# esxcli system syslog reload

(you can also set the loghost from the vSphere client by going to configuration -> advanced settings -> syslog -global)

After reloading the syslog you will see a directory on the syslog collector host containing the ESXi host’s logfile as shown below.

Sample Syslog Collector using VCSA

B-vcsa-log

Sample Syslog Collector using Windows Server

B-win-log

Setting up the ESXi 5.0 Dump Collector

A core dump is the state of working memory in the event of host failure. By default, a core dump is saved to the local disk.  In the case of Auto Deploy where there may not be a local disk the core dump will be saved to a ramdisk in memory, which is a problem because the core dumps will be lost when the host reboots.

To solve this dilemma vSphere 5.0 includes a new feature called the ESXi Dump Collector.  The Dump Collector enables you redirect ESXi host core dumps onto a network server.  It’s very easy to install and configure.  The dump collector is included as part of the vCenter Server Appliance (VCSA) and requires no extra setup.  To install the dump collector on Windows simply load the vCenter installation media, launch autorun and from the main install menu choose “ESXi Dump Collector”.

A-dump-collector

During the install you will be asked where on the host you want to store the core dumps, by default the dump repository is 2GB, which should be sufficient for most environments.

Aa

You will also be asked if you want to do a Standalone installation or integrate the dump collector with  vCenter.  If you choose the VMware vCenter Server Installation it will register the Dump Collector plug-in with the vCenter server.

Ab

The final step is to use the ESXCLI to configure each host to use the Dump Collector.  The screen shot below shows the commands to do this.

Ac

Now anytime your ESXi 5.0 host generates a core dump the file will be saved on the network server and available for analysis.  The screen shots below show the file being copied to the network server when my ESXi host PSODs.  I also provided a listing of the corefile for reference.

A-PSOD

A-vcsa-dump

 

 

SRM 5 and Scripts

In SRM 5 we have the same scripting capability we have had in the past – the ability to execute a script on the SRM console.  Check out the blog about it here.  It is still useful and correct information.  However, we now have a new capability as well.  We can now execute a script in a guest. 

Lets see what it looks like.  First, be in SRM, and access the Recovery Plans section.  Now select the Virtual Machines tab.  It should look something like below.

Defaultdepends

Notice on the far right we see Pre-power On Steps and Post-power On Steps columns (you may need to click on the image to see it all)?  The Post has a command step labeled Date / Time Log – which is not there by default by something I added.

Now we highlight one of the virtual machines and use the blue button to edit it.  You will see something like the screen below.

 Vmrecoveryprops

Note in the above screenshot we have 1 Step for Post Power On and none for Pre?  You will normally have more steps in Post rather than Pre – remember pre means the VM is not running yet so hard to make changes inside of it.  You can also see here were the script will process – SRM server or Guest.  Guest means the script will process inside the VM.

You can use the Add button to add another step above, or you can use Edit to look at the existing one.  If we highlight Date / Time Log as we see it above and use the Edit button we will see what we see below.

Editscript 

Here is another example where the script is not a batch file but a PowerShell script.

Pw

We can see above several important things.  We can see that we can now in SRM 5 choose a Type that is a Command on Recovered VM or a Command on SRM server.  You still also can do a prompt as well.

Executing something on the console of the recovery VM sounds easy and it is, but remember you will need to manage that script on that VM.  It may be easier to hold the script on the SRM console and execute it from there – it can still impact the inside of the VM via things like PowerCLI – this may be a best practice kind of thing – create / edit / manage a script on each VM, or create / edit / manage a single script on the SRM console?  Depends partly at least on your skills / knowledge and the needs of the situation.

Once you have this configured and you execute the plan, you will see in the history report the reference to this script.

Scripthistory
Security and Scripts

The script that executes on the SRM console is executing under the rights of the account that is running the SRM service.  You may have to change that to a different account depending on what your script does. 

The script that executes on the VM is executed under the context of the account that is running VMware Tools.  Again, that may need to be changed depending on what you are doing in the script.

Failback and Scripts

It is important to understand any scripts you run while failing over will run as well when you failback unless you modify or change them.  This means you need to setup your script environment – whatever it is on both sides, as well as maintain the scripts in both places.  The same script does not have to run in both directions but it will by default.

Conclusion

While in this example we have a batch file being executed you can really do what you need to do for scripting, I do mostly batch and PowerShell you can do something else.  So long as it can actually run on the SRM console, or inside the guest you are good to go.

We have looked at how you can execute a script locally in a recovered VM or on the console of the SRM server.  It is good that the method to do either of those is the same.

Update: 8/7/11 – thanks to the comments of Iwan, I thought I would add a little more.  Since you have a script that can run both on the recovery side, and the protected side, sometimes it is important for the script to know which side it is running on. When a recovery plan is running SRM injects information into the environment.  There is one variable called VM_VC_Host that can be used in your script to determine which vCenter host is part of the recovery.  This will easily allow you to determine which side your script is processing on.  Be sure to check the SRM 5.0 Admin guide to see the full list of variables you can work with.

=== END ===

Does the new 5.0 ESXCLI eliminate the need for vCLI?

I’ve been asked several times if the new ESXCLI command in 5.0 eliminates the need for vCLI.  It doesn’t, as a matter of fact the new ESXCLI command is part of the vCLI.  I think where the confusion comes is because the ESXCLI command is also included with the ESXi Shell.   Which brings up the question do you need both?  And the answer is you do, let me explain…

When we talk about the new ESXCLI command we like to emphasize that for the first time the same command can be used for both local and remote CLI administration.  Technically speaking this statement is a bit misleading because although you can run ESXCLI commands both locally and remotely, there are actually two versions of the command – one that runs in the ESXi Shell and one that runs remotely as part of vCLI. 

  • ESXi Shell – Every ESXi host comes with a version of the ESXCLI command in its ESXi shell.   This command can only be used to perform local administration of the host.  When working with the local version of the ESXCLi command you don’t need to specify a target host because the local host is always assumed.  If you were to try and specify a remote host you would get an error because the ESXCLI command in the local shell doesn’t support working remotely.

A-vcli-local

  • vCLI – Along with the local version of the ESXCLI command there is a remote version provided with the vCLI that is used for remote administration.  While the command syntax is largely the same as the local version, when working remotely you need to provide additional connection options to specify the target ESXi host along with the user credentials.

A-vcli-remote

So yes, you can run ESXCLI commands both locally and remotely.  However, there are actually two different versions of the command – one for working locally from the ESXi Shell and another when working remotely with the vCLI.   The only difference between the two is when working locally the local host and logged in user credentials are assumed, and when working remotely you need to specify the target host and user credentials.

Where does the vMA fit in all this?  The vMA is simply a Linux appliance that has the vCLI pre-installed.  Prior to 5.0 the vMA was based on RHEL but starting with 5.0 it’s now based on SLES.   The vMA makes using the vCLI easy, you simply deploy the OVF and your off and running.  However, if you don’t want to use the appliance you can just as easily deploy your own Linux VM and install the vCLI packages yourself.   There is even a version of vCLI that can be installed on Windows as well, something many people don't realize.

Speaking of Windows, where does PowerCLI fit in all this?  Think of PowerCLI as an alternative to vCLI.  Most Windows admins tend to prefer PowerCLI because it’s built on top of PowerShell.  vCLI on the other hand tends to be more popular with Linux admins because it has a Linux look and feel.

I hope this helps clear up some of the confusion.

vSphere 5.0 Storage Features Part 5 – Storage DRS – Balance On Space Usage

One of the really great features of Storage DRS (SDRS) is the fact that it will balance Virtual Machines based on both space usage and I/O latency. In this post, I will talk about balancing based on space utilization and a little about the additional SDRS configuration options that are available for space usage.

By default, when you enable SDRS on a datastore cluster (see previous post for definition of a datastore cluster), balancing based on space usage is automatically configured. The threshold is set to 80% but can be modified if you so wish. What this means is that if space on a particular datastore is utilized 80% or more, SDRS will try to move VMs to other datastores using Storage vMotion to bring this usage value back down below 80%. The usage statistics of the datastores are checked on an ongoing basis.

If the cluster is set to automatic mode of operation, SDRS will use Storage vMotion to automatically migrate VMs to other  datastores in the datastore cluster if the threshold is exceed. If the cluster is set to manual, the administrator will be given a set of recommendations to apply. SDRS will provide the best recommendations to balance the space usage of a datastores. As before, once you apply the recommendations, Storage vMotion will be used to move one or more VMs between datastores in the same datastore cluster.

It should also be noted that unless SDRS sees a utilization difference of at least 5% (default) between the different datastores, then it is not going to move VMs around the datastores in the datastore cluster, i.e. when SDRS sees that there is only a 3% space utilization difference between datastore1 & datastore2, then it will not make any recommendation to move VMs between these datastores, even if one datastore is above the 80% threshold. This 5% setting can be modified via the Advanced Options in the SDRS Runtime Rules section of the datastore cluster.

Sdrs-6

Again, to gain a certain comfort level with this new feature, I highly recommend running in manual mode, and examining the recommendations before applying them. This way you will see that Storage DRS is indeed doing the 'right' thing when selecting a datastore based on utilization.

You can also run Storage DRS with only space utilization checks and no I/O metrics checks by unchecking the I/O Metric Inclusion box shown above.

In my next post, I will discuss the other balancing feature of Storage DRS – load balancing based on I/O metrics, specifically latency.

ESXi 5.0 Command Line Part 1 – Introduction to the new ESXCLI

ESXi 5.0 comes with a new ESXCLI command making it easy to administer your ESXi hosts from the command line.  Although the ESXCLI command in 5.0 has a similar look and feel as it’s 4.x predecessor, the new command comes with an improved syntax and support for many additional namespaces.  In addition, and even more significant, is that for the first time the new ESXCLI provides a single command set that can be used for both local and remote command line administration.   

In 5.0 the ESXCLI command is a replacement for the deprecated esxcfg-* commands used in earlier release.  It’s important to note that even though the esxcfg-* commands are still included with ESXi 5.0 these commands are deprecated, meaning they are no longer officially supported.  They have been left behind to help as you transition over to ESXCLI.  The esxcfg-* commands will eventually be removed in future releases.

A1

In vSphere 5.0 the ESXCLI command set is not yet a comprehensive CLI tool.  It will take time to add all the required functionality into the ESXCLI command set.  Until that happens users will continue to augmented the new ESXCLI with the well known vicfg-*, vmware-cmd, vmkfstools, and other CLI commands.  Over time the ESXCLI command set will continually be expanded and the non-ESXCLI commands eventually phased out and deprecated.   

Why a new CLI?

A challenge longed faced by vSphere administrators has been the need to use an array of different commands to administer their hosts.  In addition, a completely different set of commands was needed when working locally on a host versus remotely through the vCLI.  The new ESXCLI command targets these challenges by establishing a single CLI command set that can be used for both local and remote host administration.  In addition to providing a single tool, the new ESXCLI command also establishes a standard that will continue to be extended and enhanced.  This standard is documented making it easy for vendors to integrate and provide customized plug-ins enabling users to also use the ESXCLI command to manage 3rd party add-on components.

ESXCLI Command Overview

Learning to use the ESXCLI command is very easy.  The best way to start is by logging on to the ESXi shell and from the command prompt type “esxcli”:

Cli-fig1

Typing the command with no options comes back with the usage screen shown above.   This method of getting command help will be familiar to people who have used ESXCLI in the 4.x release.   From this output we see the section labeled “Available Namespaces:”.   Each namespace represents a category of commands, for example there is a namespace for “network” and for “storage”.

Lets look at the “network” namespace to see what additional namespaces and commands are available by running the command “esxcli network”:

Cli-fig2

Here we see additional namespaces for fence, firewall, IP, vSwitch, and NIC.   To see the options are under vswitch you simply type “esxcli network vswitch” (catch the pattern?).

Cli-fig3

Here we see two namespaces under vSwitch, one for working with a distributed vSwitch and another for a standard vSwitch.   Lets see what the command is to display the list of configured standard vSwitches by running “esxcli network vswitch standard”:

Cli-fig4

Here we again see the additional namespaces, but in addition for the first time we see available commands.  From here we can see that in order to list the configured standard vSwitches on the host we run the command “esxcli network vswitch standard list”:

Cli-fig5

I think you will agree that it wouldn’t require much additional effort to further explore the ESXCLI command to identify how to add a third vSwitch or modify any of the vSwitch settings.  And remember, part of what makes ESXCLI really cool is that can be used remotely as well as locally. 

Conclusion

ESXi 5.0 comes with a new and improved ESXCLI command set.   Different from it’s 4.x predecessor the new ESXCLI command supports many additional namespaces and for the first time provides a common CLI tool that can be used for both local and remote administration. 

The new ESXCLI command doesn’t yet provide a comprehensive set of capabilities.  The focus for the 5.0 release is to provide a replacement for the now deprecated esxcfg-* commands.  As such in vSphere 5.0 the ESXCLI will continue to be augmented with the vicfg-, vmware-cmd and other available CLI tools.  Over time, the ESXCLI command set will be expanded and the other CLI tools deprecated.

Learning to use the new ESXCLI command is a breeze with an intuitive and easy to learn syntax.  Simply type the “esxcli” command and press enter to get started, then simply navigate through the nested namespaces until you reach the command you need. 

Next up we’ll go over some examples of using ESXCLI remotely…

Failback? Where is the button!?

Are you one of the ones that cannot live with the manual nature of the failback operation in SRM 4?  You need that Failback button that you have heard about in SRM 5?  And you didn’t find it did you!  I will help you with that in this blog and hopefully it all makes sense.  You will be doing a simplified and automated failover back to the original site soon so no worries!

BTW, again I have issues with the size of my screen shots.  If you don't see the edge of the pictures below you can click on them for the full image.

When you are in the main Recovery Plans part of the SRM UI you see something like the following.

  Rpstatus copy

A close up on the buttons in the top right shows there is no failback button – or is there?

Buttons

It is important to understand that failback is a marketing term, and not a feature.

Failback requires you to first recover on the recovery side.  Than you reprotect and this means you are now protected (in the reverse direction).  Replication is going in the opposite direction, and you are ready to do a recovery or test, but towards the original side.

This looks like this:

  Recoverycomplete

It is important to note above you only see the Reprotect button as clickable when you have compelted the recovery.  If for some reason your recovery has not completed you will not be able to reprotect so try running the recovery again.  It will not recovery anything that it has already successfully recovered.

Recovery complete

Now that our virtual machines have failed over to the recovery side, and are in fact running there, we need to reprotect them.

We use the Reprotect button for that.

Reprotectbutton
 After we start the Reprotect actions, we get to watch it as the reprotect actually happens.  Notice how it reverses the storage replication, and than it protects the VMs again?

  Reprotect

Once this is done we are protected once again.  If our original protection was Site A with VMs running on it, we now have our VMs running on Site B.  But if anything happens to Site B we can recovery safely on Site A.

Now we can (and should) test things so that when we do a recovery back to the original site it will work properly.

Once our test recoveries are all good, we do a planned migration back to where we came from.

Planned
Notice how your data will be synchronized back to the original site as part of this planned migration?  Very handy as in the past you would have had to work in several different interfaces to manage this!

So in summary, failback means: failover, reprotect, test, and failover.  This will get you to the recovery site and back again!

Thanks for reading, and make sure to leave comments if you have any questions!

Update 7/21/11 – Just added a short note above that mentions the reportect button is only available if the recovery has successfully completed.

Michael

Follow the ESXi Chronicles on twitter

Keep up with the VMware ESXi Chronicles by following us on twitter @VMwareESXi.  Get a head start on learning about the new featues in vSphere 5.0 to include the new ESXCLI command, Image Builder and Auto Deploy, along with recommendations and tips on preparing for a smooth upgrade to 5.0.

vSphere 5.0 Storage Features Part 4 – Storage DRS – Initial Placement

Storage DRS (SDRS) is another new feature of vSphere 5.0, and is possibly one of the most exciting.

It allows for intelligent initial placement and on-going space & load balancing of Virtual Machines across datastores in the same datastore cluster. A datastore cluster is simply a folder of datastores. Here is a view of one such datastore cluster, which contains 3 NFS datastores:

Sdrs-2

When Storage DRS is enabled on a datastore cluster, the datastore cluster becomes a load-balancing domain. You can place both VMFS-3 & VMFS-5 datastores in the datastore cluster if you wish, but the best practice recommendation is to place datastores which have the same capabilities into the same datastore cluster. You cannot mix NFS & VMFS datastores in the same datastore cluster.

When a datastore cluster is created, SDRS is automatically enabled, as shown here:

Sdrs-1
We will look at the load-balancing featues in a future posting, but in this post I want to talk about the initial placement, which is really a great feature. When you create or migrate a VM, you can choose the VM to be placed in a datastore cluster. SDRS will now pick the most appropriate datastore for this VM, based on available space and I/O latency:

Sdrs-3

In automatic mode, SDRS will chose a datastore for you. The other nice feature is that even if you have automatic mode selected for SDRS, when it comes to deploying or migrating a VM, you can still choose to do the initial placement manually if you wish.

To do this, at the Ready to Complete screen, select the 'Show All Storage Recommendations':

Sdrs-4
You can then choose which datastore to deploy your VM onto, as per the recommendations shown below:

Sdrs-5

One thing you may notice is that even when the cluster is in manual mode, you are not given any datastore to choose during deployment unless you select ‘Show all storage recommendations’ in the Ready to Complete window. This might give the impression that initial placement is indeed automated even when the cluster is manual. What is actually happening is that SDRS calls "Apply Recommendations" on your behalf and the top manual placement recommendation that SDRS provided is chosen for you.

My best practice recommendation around SDRS is pretty much along the same lines we gave customers when we initially released DRS back in the day – run everything in manual mode to begin with, and make sure that the recommendations that are being made to you by SDRS for initial placement of VMs make sense – check the space usage & latency of the datastores in the datastore cluster, and verify SDRS is selecting the best one. Once you see that the recommendations are indeed making sense, you can start to think about letting SDRS make these decisions automatically.

In future postings, I will discuss how Storage DRS does load balancing based on space usage and I/O metrics.