Home > Blogs > VMware vSphere Blog > Monthly Archives: April 2011

Monthly Archives: April 2011

Coming to a VMUG near you – everything you need to know about transitioning to ESXi!

Been putting off your move to ESXi?  Have questions?  Looking for guidance?  Register to attend one of the VMware User Group (VMUG) Regional User Conferences being held over the next several months throughout the U.S. and Canada.  Come on out and learn from industry experts, visit with vendors, mingle with your fellow vSphere administrators, and of course learn everything you need to know about migrating to ESXi.

Here’s a list of some of the upcoming VMUG Regional User Conferences where VMware will present on transitioning to ESXi:

May 5th, Charlotte
May 10th, Philadelphia
May 17th, Central Ohio
May 18th, Silicon Valley
May 19th, Connecticut
June 21st, Vancouver BC
July 18th, Indianapolis

For more information and details on registering visit the official VMUG Conference home page.

VMTN Podcast about Transitioning to ESXi

Yesterday the weekly VMTN podcast was about ESXi featuring Charu Chaubal, Alberto Farronato and I. I thought it was an interesting podcast with lots of good questions and comments from the participating audience. You can download the podcast or subscribe to iTunes to get it automatically.

In the chat-window there are usually many links dropped and this time it wasn't any different. Here are the links for this weeks podcast:

Scratch partition best practices for USB/SD booted ESXi?

Last week I received some questions around the best practice for sizing the scratch partition with USB/SD booted ESXi hosts. As the installable uses 4GB as a default value the assumption was made that this was the minimal required amount of diskspace for a host. As you can imagine when having 32 nodes in a cluster, but even with 10, using a default value of 4GB per host for a shared volume (VMFS/NFS) can become fairly expensive as SAN storage will need to be sacrificed for it.

Lets first explain where the 4GB is coming from, the partition type used for a normal install for the scratch partition is vfat. The maximum size of that partition type is 4GB and not entirely coincidentally the installer uses the same value. Main reason of course being to have room for future versions. This also implies that in many scenarios 4GB is more than likely not required.

How much would you need to reserve for a shared volume? We recommend to ensure alarms on this datastore are enabled and to monitor it closely so that it never fills up. We recommend using a 20GB volume, even in the case of a 32 node cluster this will suffice and in a cluster smaller than 15 nodes it will give room for future expansions without the need to change the size of the volume. Ensure each server has its own directory and set the scratch partition advanced setting accordingly.

The following KB Article provides multiple mechanisms for setting the location of scratch partition.

Become a true ESXi expert with the new FREE VMware eLearning course and ebook offer!

Great news for all VMware customers: the VMware Education Services team has just made available a new, FREE elearning course dedicated to ESXi , “Transition to ESXi Essentials”. The course is a self-paced three-hour online training that provides the knowledge necessary to make fundamental design decisions to successfully add VMware ESXi to a vSphere environment and to take advantage of all of the new features included in ESXi 4.1. The training is ideal for system administrators, consultants and engineers responsible for managing and supporting a vSphere environment.

But it doesn’t end here! By completing the Transition to ESXi Essentials course and the short survey that follows at the end, you will also receive a FREE ebook copy of “VMware ESXi: Planning, Implementation, and Security authored by Dave Mishchenko and edited by VMware technical experts. Dave’s book is truly a phenomenal resource for all VMware users to learn everything there is to know about ESXi. While the Transition to ESXi Essentials course will remain available for free in the VMware Education Services portal, we’ll be able to also offer Dave’s book for free only while supplies last. So, hurry up, complete the training and turn in your survey as soon as possible!

For those of you who want to get a more direct hands-on experience on ESXi, VMware Education Services continues to offer an in-depth instructor-led two day course (in this case for a fee). Here is the link to class schedule and registration.

As you all know very well by now, 4.1 is the last release for ESX. Future releases of vSphere – starting with the next one – will only be based on the ESXi hypervisor architecture. Needless to say, now is the time to ramp up your expertise about ESXi and how to take advantage of its unique capabilities. The training options discussed above are just a couple of the several resources you can leverage to learn about ESXi. Others that are strongly recommended to help you become the ultimate ESXi expert are:

 

Need to install 100s of ESXi hosts?

Yes you can of course use UDA or EDA but in my environments that is not an option. I was scavenging the internet to find an article that explained how to build your own UDA/EDA appliance and found something very close to home. This article was written by Tom Arentsen who is a freelance consultant in the Benelux region. Coiincidentally I have worked with him on several projects and I knew he was one smart cookie.

Tom created a solution that uses the Microsoft Windows Deployment Service. Through a simple host list he pulls down the correct bits and pieces needed to install and configure the host. This is probably the missing link some of you were looking for. I have had this question multiple times in the past and figured I would have to document it at some point. I am glad Tom already did this as it saves me and many others a lot of work.

Source: ESXi 4.1 Scripted Installation

While there are quite a few articles on the internet about this topic, every customer always asks me about it. Therefore I decided to document a step by step procedure on how you can install ESXi hosts in a fast, repeatable and fully automated fashion. You can use a pre-configured appliance like http://www.ultimatedeployment.org/ , however I intend to go for Microsoft Windows Deployment Service (WDS) as most organizations already are using it for deploying their Windows servers and why introducing something new while there is already product that you are familiar with. 

Deployment overview

The deployment method described in this article is based on DHCP reservations to provide every ESXi host with a unique address based on their MAC address during the deployment phase (in case DHCP reservations are not allowed in your environment for whatever reason, I do include another option however it requires manual intervention). Instead of creating a separate kickstart file for every ESXi host you have to deploy I use one common kickstart script file and in addition for every ESXi host, two unique files will have to be created that contain the actual IP address for the management interface and the vMotion interface (obviously you can easily extend this if you require an extra vmkernel address for IP storage).

Common kickstart script:

I am not going to copy the full article, head over to the source and check how Tom solved this problem.

Increasing SRM log retention

Hello all,

Update – this blog applies to SRM 4 and SRM 5.  In both cases you should follow the suggestions in this article.

I have been very busy working on our next major release of SRM (and wow! is it ever going to be powerful and cool!) and as a result I have not spent as much time with our current release as I would like and that has meant less blog activities.  But I was working on something yesterday and I realized that we did not have the detail in the logs I wanted.  And I knew I was looking at a blog article!  So, in the spirit of getting better SRM support for you, like in this blog, here is a suggestion.

If you have a situation where the SRM logs are getting full very fast, such as with an SRA that is generating really a lot of log entries, you may end up working your way through all of the logs, and by the time support sees the logs, the cause of the problem has rotated right through them, and is not visible any longer.  This is because by default we limit the number of logs to 10, and the size of each to 5 MB – we do not know how much space you will have on drive C so we need to be very conservative.  So this may not always be enough.  I would suggest changing these parameters using the information below.

These settings are not part of advanced settings, and so will need to be done in the vmware-dr.xml file.  Make a backup copy of it first.  You will need to make these changes at both sites, and will require SRM to be restarted on each side – so make sure it is not in use first.

  1. Locate the SRM folder, and in it find the config folder.
  2. You should now be able to find the vmware-dr.xml file. 
  3. Use something like notepad to edit this file.
  4. Look for the section in this file that is denoted by <log>.
  5. You will need to add the following lines between the <log> and </log>.

            <maxFileSize>x</maxFileSize>
            <maxFileNum>y</maxFileNum>

Where x should be the maximum file size – default 5 MB, and where y should be the maximum number of log files – default is 10.

You will know how much space you have, but it might be a start to use a max file size of 10 MB and a maximum number of files of 20.  Think about how much disk space you have carefully.

You will need to restart the SRM service for this change to be active.

The log location may vary BTW, so remember this:

Pre Windows 2008 (maybe R2 timeframe) the logs are in:

C:\Documents and Settings\All Users\Application Data\VMware\VMware Site Recovery Manager\Logs

Post Windows 2008 (maybe R2 timeframe) the logs are in:

c:\ProgramData\VMware\VMware vCenter Site Recovery Manager\logs

Here is what your vmware-dr.xml file should look like after you make the change.

Log
Note that the red arrows point at what we added, and the blue arrow points at what was already there.

Since this is an admin type blog, I wanted to push another admin type article I wrote.  Did you see the one about the Recommended Alarms for SRM Admins to watch?  Check it out here.  My experience is that many SRM customers are not monitoring SRM, and while there is not a lot to monitor, and not many issues, the fact is that since it is your DR solution, you need to make sure you are always aware of any issues!

Michael

Remember comments are always welcome!

vSphere 4.1 Tops the InfoWorld Virtualization Shootout

I am very excited this morning to share the results of the recent virtualization shootout conducted by InfoWorld. Once again, vSphere has shown its ability to deliver the best performance performance, scale, and advanced feature set in the industry today. The tests were done using vSphere 4.1 and compared vSphere to the other players in the server virtualization market.

Complete Article

VMware vSphere Section

Enjoy!

-Mike Adams

vCenter Update Manager (VUM) Temporarily Disables FT

I recently discovered that when using VUM to patch my vSphere 4.0 FT clusters that things don’t exactly work as I thought.  I knew that when patching a host with an FT protected VM that VUM will temporarily disable FT during the remediation.  However, what I didn’t realize is that FT not only gets disabled for the VMs on the host being remediated, but it actually gets disabled for all the FT protected VMs in the cluster.  It doesn’t matter which host in the cluster is being patched, FT gets disabled for all VMs in the cluster.  This caught me by surprise as it’s not very intuitive that when patching Host-A that VUM will disable FT for a VM running on Host-C and Host-D.

Looking into this I noticed that VUM does provide a warning, but even with the warning it’s not clear that VUM disables FT at the cluster level and not just for the VMs on the host being remediated. 

FT-Failure-Options-3.5

In VI3.5 and vSphere 4.0 the only way to avoid disabling FT for all the VMs in the cluster is to temporarily move the host out of the cluster while you remediate and then move it back after your done.  Not a difficult workaround but it does require extra steps, which are easy to forget.

In vSphere 4.1 things get a little better as you can now choose whether or not to disable FT when you remediate a host, but do be aware as the text description next to the checkbox is misleading.  The message implies that FT will only be disabled for specific VMs running on the host(s), but the reality is if you choose to disable FT it is still done for all VMs in the cluster.

FT-4.x-disable-ft

Bottom line, when using VUM to patch clusters containing FT protected VMs be aware that during the remediation FT gets disabled for the entire cluster and not just the host you are patching.   This caught me by surprise, and I don’t really like surprises.

Regards,

-Kyle

Ops changes part 8 – Logging in, Auditing and Log files

One of the things to take into considerations when planning, designing or implementing an ESXi environment is what to do with the log files, how to audit ESXi and if you should allow people to log in.

Logging

Log files are the type of files that you will rarely need, but if you need them they better be available. ESXi is no different than any other solution out their, well maybe it is slightly different. Depending on the type of device you used for the installation the log files might not be retained after a reboot. Hopefully everyone has read my article about the scratch partition and created it, if the installer hadn't done that for you.

The log file structure for ESXi when compared to ESX is slightly different. Due to the fact that there is no Service Console there is also no need to have the same collection of files. With ESXi the following log files are used:

Path + Log File Description
/var/log/messages This log file includes the VMkernel, vmkwarning, and hostd logs
/var/log/vmware/hostd.log Host Management service (hostd = Host daemon) log
/var/log/sysboot.log System boot log
/var/log/vmware/aam/vmware_hostname-xxx.log VMware HA log file

Of course ESXi offer syslogging capabilities and this is what we generally recommend using. Using a syslog server will make troubleshooting a lot easier as log files are stored in a central location.

Another thing that is possible, which many seem to forget, is to specify a location for the local logfiles. Now in this case the local logfiles don't need to be as local as it sounds. A best practice for environments without a syslog server would be to specify a remote VMFS datastore  to ensure that if anything happens to the local storage the log files will still be available and there is at least a possibility you can do a root cause analysis. This is fairly straightforward and well explained in this KB article but I have copied it for your convenience:

To configure syslog for an ESXi host via GUI:
  1. In the vSphere Client inventory, left-click the host.
  2. Click the Configuration tab.
  3. Click Advanced Settings under Software.
  4. Select Syslog in the tree control.
  5. In the Syslog.Local.DatastorePath text box, enter the datastore path to the file where syslog will log messages. If no path is specified, the default path is /var/log/messages.  In addition if pointing at a datastore, ensure that the directory has been previously created.

    The datastore path format is /vmfs/volumes/<datastore>/<folder>/filename

    Note: You may have to reboot the host for the changes to take affect. We would also like to recommend to include the server name in the "folder" name.

Of course you can use the vMA as a syslog repository as described by Simon Long in this article. Other alternatives are for instance Splunk which will enable you to do event correlation between multiple "devices" where vMA is more aimed at ESXi only. Again, setting this up is fairly straight forward:

  1. In the vSphere Client inventory, left-click the host.
  2. Click the Configuration tab.
  3. Click Advanced Settings under Software.
  4. Select Syslog in the tree control.
  5. In the Syslog.Remote.Hostname text box, enter the name of the remote host where syslog data will be forwarded. If no value is specified, no data is forwarded.
  6. In the Syslog.Remote.Port text box, enter the port on the remote host where syslog data will be forwarded. By default Syslog.Remote.Port is set to 514, the default UDP port used by syslog. Changes to Syslog.Remote.Port only take effect if Syslog.Remote.Hostname is configured.
  7. Click OK.

I cannot make the decision what type of tool, or even to use a tool, for you but I do recommend to ensure log files are stored outside the hypervisor itself.

Login and auditing?

ESXi is not designed for Shell access. Yes, there are ways of getting into the shell. But we generally recommend against loggin in for day-to-day operations. ESXi has been designed as an in-memory hypervisor that should be treated as an appliance.

The two main arguments for restraining yourself for loggin in to the ESXi shell are:

  1. Memory Filesystem
  2. Auditing

The first one is the obvious one. As ESXi lives in memory it requires memory to operate. Running scripts or custom binaries could mean that this has an impact on the in-memory filesystem. Who hasn't experienced this with ESX Classic where a roque script would fill up the filesystem? Believe me when I say that the result was never pretty.

The second one is less obvious but think about it for a second. ESXi is currently not setup for having non-root local users, but yes AD works indeed. (Check Harley Stagner's article on how to set it up) But what does that mean with regards to having an audit trail? Well unless you are using AD authentication for your ESXi environment you are limited. A better solution would be to use the vMA as it is designed to function as a management appliance. Of course alternatives are the vCLI and PowerCLI. All three provide auditing mechanisms and in the case of the vMA it can also be used to gather log-files which has been described in this excellent article by Simon Long.

Is your environment secure?

My friends over at the VMware Security Blog posted an article yesterday that the Security Hardening Guide for vSphere 4.1 has been released. Coiincidentally Richard Garsthagen posted an article about all the ESX/ESXi hosts he found directly attached to the internet, I guess you could say that that goes against every best practice out there. But that is not entirely the reason for this article. I wanted to point out an excellent script by William Lam that assesses your environment based on the recommendations made in the Security Hardening Guide and produces a nice report with a scoring card.

Source: vmwarevSphereSecurityHardeningReportCheck.pl 1.5

The script is currently based on the following revisions of the vSphere 4.x Security Hardening Guide: 

  • Introduction

  • Virtual Machines

  • Host

  • vNetwork

  • vCenter

  • Console OS (for ESX)

While going through the COS/HOST and VM documentation, I noticed there were quite a few checks that might benefit from having a script to validate the guidelines and that was the motivation for this script. Not all sections can be validated using the vSphere APIs and will require some manual validation and I've seperated the types of passes whether it's a fail, pass or manual (which requires user intervention).

The script allows you to run a subset of the checks and against different type of validation (ENTERPRISE,DMZ or SSLF). Upon completion, a report is generated including a grade for your environment.

So if you are serious about your virtual infrastructure, read the paper / run the script and make the changes where appropriate and desired to improve the security! Before I finish this post, major thanks to people like William Lam who spent an insane amount of their free time to develop scripts like these for the community. Thanks William, VMware truly appreciates all the work you and others are doing!