Home > Blogs > VMware Support Insider > Category Archives: Datacenter

Category Archives: Datacenter

Using vSphere ESXi Image Builder to create an installable ISO that is not vulnerable to Heartbleed

Here is a follow-up post from Andrew Lytle, member of the VMware Mission Critical Support Team. Andrew is a Senior Support Engineer who is specializes in vCenter and ESXi related support.

VMware recently released updates to all products affected by the vulnerability dubbed “Heartbleed” (CVE-2014-0160): http://www.vmware.com/security/advisories/VMSA-2014-0004.html

As per KB article: Resolving OpenSSL Heartbleed for ESXi 5.5 – CVE-2014-0160 (2076665), the delivery method for this code change in the VMware ESXi product is through an updated ESXi vSphere Installation Bundle (VIB). VIBs are the building blocks of an ESXi image. A VIB is akin to a tarball or ZIP archive in that it’s a collection of files packaged into a single archive.

Typically a new ESXi ISO file will be made available only during major revisions of the product (Update 1, Update 2, etc). If you need an ESXi 5.5 ISO which is already protected from Heartbleed, you can make your own ISO easily using vSphere PowerCLI.

The PowerCLI ImageBuilder cmdlets are designed to make custom ESXi ISOs which have asynchronous driver releases pre-installed, but it can also be used in a situation like this to make an ISO which lines up with a Patch Release instead of a full ESXi Update Release.

In this post we will cover both the ESXi 5.5 GA branch, as well as the ESXi 5.5 Update 1 branch. Choose the set of steps which will provide the ISO branch you need for your environment.

Creating an ISO based on ESXi 5.5 GA (Pre-Update 1)

These steps are for downloading the requirements for creating an ISO which is based on the ESXi 5.5 “GA” release, which was originally released 2013-09-22.

Step 1: Download the Required Files

When creating a custom ESXi image through Image Builder, we need to start by downloading the required files:

Install PowerCLI through the Windows MSI package, and copy the zip files to a handy location. For the purposes of this example, I will copy these files to C:\Patches\

Step 2: Import the Software Depot

  • Add-EsxSoftwareDepot C:\Patches\ESXi550-201404020.zip
    1-1

Step 3: Confirm the patched version (optional)

If you wish to confirm the esx-base VIB (which includes the Heartbleed vulnerability code change) is added correctly, you can confirm the VIB has Version of 5.5.0-0.15.1746974 and the Creation Date of 4/15/2014.

  • Get-EsxSoftwarePackages –Name esx-base
    1-2

Step 4: Export the Image Profile to an ISO

  • Export-EsxImageProfile –ImageProfile ESXi-5.5.0-20140401020s-standard –ExportToISO –FilePath C:\Patches\ESXi5.5-heartbleed.iso
    1-3

Creating an ISO based on ESXi 5.5 Update 1

These steps are for creating an ISO which is based on the ESXi 5.5 “Update 1” release, which was originally released 2014-03-11.

Step 1: Download the Required Files

When creating a custom ESXi image through Image Builder, we need to start by downloading the required files:

Copy the zip files to a handy location. For the purposes of this example, I will copy it to C:\Patches\

Step 2: Import the Software Depot

  • Add-EsxSoftwareDepot C:\Patches\ESXi550-201404001.zip
    2-1

Step 3: Confirm the patched version (optional)

If you wish to confirm the esx-base VIB (which includes the Heartbleed vulnerability code change) is added correctly, you can confirm the VIB has the Version of 5.5.0-1.16.1746018 and Creation Date of 4/15/2014.

  • Get-EsxSoftwarePackages –Name esx-base
    2-2

Step 4: Export the Image Profile to an ISO

  • Export-EsxImageProfile –ImageProfile ESXi-5.5.0-20140404001-standard –ExportToISO –FilePath C:\Patches\ESXi5.5-update1-heartbleed.iso
    2-3

Installing the ESXi ISO

The ISO file which was created in this steps can be used in exactly the same manner as the normal VMware ESXi 5.5 ISO. It can be mounted in a remote management console, or burned to a CD/DVD for installation.

Why storage paths go into a Dead state

Ever wonder why your storage path goes into a “Dead” state?

Staff Engineer Nathan Small has authored a new Knowledgebase article which describes most of the scenarios that lead to this condition.

There basically three reasons your storage path can go into a “dead” state:

  1. The ESX Storage stack determines path is dead due to TEST_UNIT_READY command failing on probing
  2. The ESX Storage stack receives a Host Status of 0×1 from HBA driver:
    a. Remote array port has timed out
    b. Remote array port has dropped from the fabric (RSCN)
    c. Remote array port has closed IP connection
  3. The ESX Storage Stack marks path as dead after PDL check condition returned by Storage array

Check out Nathan’s excellent article on the topic here: Understanding how paths to a storage/LUN device are marked as Dead (2062592)

Some new ESXi patches today

Some new patches for ESXI out today you might want to be aware of:

New Network port diagram for vSphere 5.x

Over the past few weeks we have been working on constructing a brand new network diagram, depicting ports in use for vSphere 5.x

These diagrams have been very popular in the past and we hope you like this one too! We created Knowledgebase article: Network port diagram for vSphere 5.x (2054806) as a container for the pdf diagram. The pdf also lists all of the ports used in tabular format.

If you’d like to see more of these, tell us in the comments section below!

Network port diagram for vSphere 5

Alternate download location.

Note: This information provided is on a best effort basis. VMware will endeavor to update the diagram as new releases come out.

10 videos on vSphere Snapshots

Today we have compiled two lists of videos that will provide you a clear understanding of how to use vSphere snapshots effectively. The first set comes from Joe Desmond, VMware Certified Instructor. The second list comes from our VMware KBTV YouTube channel. Those videos compliment our Knowledgebase articles on the same topic.

  • vSphere Snapshots in Non Production Environments describes how to make changes in a non-production environment using vSphere snapshots, by using snapshots to compare two alternative changes to an environment.
  • vSphere Snapshot Consolidation describes how to discover and implement changes in the environment using the vSphere Snapshot Consolidation function — a function that recognizes unresolved snapshots from 3rd parties, consolidates them and cleans up the environment.
  • vSphere Snapshot Overview provides an overview of the vSphere Snapshot tool to support guest O/S administrators. Snapshots allow safe change to virtual machines without the worry of failed implementation.
  • vSphere Anatomy of Snapshots describes the snapshot process, a powerful tool that allows safe change of virtual machine states without the worry of failed implementation.
  • vSphere Powerful Tools Come with Big Warning Labels New describes the proper use of vSphere Snapshots avoiding loss of data or downtime, and allowing movement from pre-change to post-change with ease.
  • vSphere Snapshots in Action walks through two demos installing a software program using vSphere Snapshots to capture the before state and the after state.

KBTV videos discussing snapshots

SSL Certificate Automation Tool version 1.0.1

Last month we announced a new SSL Certificate Automation tool to help everyone with the implementation of custom certificates. Yesterday, we released the second version of it (version 1.0.1). This is a minor update which aims to simplify the replacement of certificates further by adding Certificate Signing Request (CSR) functionality to the tool. This functionality allows a user to quickly generate certificate requests (and consequently the private keys) for submission to the Certificate Authority.  The CSR functionality was the largest portion of manual steps, and as a result the update reduces the number of steps by over 15.

In addition, there are several minor bug fixes which were fixed which impacted tool functionality.

For further details and to download the latest version of the SSL tool see: Deploying and Using the SSL Certificate Automation Tool (2041600)

We hope these additions provide useful for everyone!

ALERT: Login issue after updating to vCenter 5.1 Update 1

VMware Support Alert VMware has become aware of an issue that may occur after upgrading to vCenter Server 5.1 Update 1.

 

Specifically:

  • You are unable to log in using the vSphere Web Client or domain username/password credentials via the vSphere Client.

This issue can occur if the specified vCenter Server login domain user account is associated with a large number of domain groups and multiple domains are configured as SSO identity sources. The precise number of groups at which this issue can occur varies due to the nature of Active Directory internals. However, it is more likely to occur once domain-group membership for an account exceeds 19.

Customers with SSO configured with multiple domain-based identity sources along with vCenter Server domain user accounts that are associated with a large number of groups should not upgrade to vCenter Server 5.1 Update 1.

We urge you to read the official KB article for more details and/or updates:
Cannot log in to vCenter Server using the domain username/password credentials via the vSphere Web Client/vSphere Client after upgrading to vCenter Server 5.1 Update 1 (2050941).

How to deploy SSO in a multisite configuration

For those of you administering multiple vSphere environments, getting a SSO multisite deployment up and running in a correct configuration is very important. Multisite deployments are where a local replica is maintained at remote sites of the primary vCenter Single Sign-On instance. The process of setting this up is not complicated, but it is possible to take a wrong turn and end up wasting a whole lot of time correcting it. That is why we have created a best-practice Knowledgebase article titled: Multisite Single Sign-On deployment best practices. (2042849). We highly recommend you look at the examples in that article.

We’ve written extensively in this blog about SSO in the past. You can see all the other posts on the topic here: http://blogs.vmware.com/kb/tag/sso

If you are still at the point where you are asking yourself- what is SSO? and why do I care? we recommend you start with this great introduction from Justin King: vCenter Single Sign-On Part 1: what is vCenter Single Sign-On?

Determining which users are available to log into vCenter Server

Question: When installing SSO in a multisite configuration, is there a way to find out which users have rights to log into vCenter? We do not have the install log files any longer and we need to know which users have been removed from the authorized users.

Answer: You can access the the following table in the vCenter database to determine which users are available to log into vCenter Server. You can then use this list to recreate  removed users on the affected vCenter Server.

Steps:

  1. Log in to the SQL Database using the SQL Management Studio.
  2. Select your vCenter Server Database.
  3. Select New Query and enter the following query:
    select * from dbo.vpx_access
  4. Click Execute
ID     PRINCIPAL                    ROLE_ID        ENTITY_ID      FLAG
1      Domain\Administrators             -1                1         3
101    Domain\user1                       5               48         1
701    Domain\user2                      -5               85         1
501    SYSTEM-DOMAIN\admin               -1                1         1
809    Domain\uesr3                      -2                1         1
602    Domain\user4                      -5               85         1
603    Domain\user5                      -5               85         1
604    Domain\user6                      -5               85         1
605    Domain\user7                      -1               85         1
606    SYSTEM-DOMAIN\admin               -2               85         1
804    Domain\testaduser                 -1              131         1
808    Domain\testaduser                 -2               85         1

Under the column PRINCIPAL, look for any users that do not currently have rights to log into vCenter. Disregard the other columns of data, you do not need that.

Troubleshooting Network Teaming Problems in ESX/ESXi

Perhaps one of the most common types of problems we encounter here at VMware Technical Support is relating to loss of network connectivity to one or more virtual machines on a host. As vague and simple as that description is, it may not always be clear where to even start looking for a solution.

Networking in vSphere is a very far reaching topic with many layers. You’ve got your virtual machines, their respective network stacks, Standard vSwitches, Distributed Switches, different load balancing types, physical network adapters, their respective drivers, the vmkernel itself and we haven’t even begun to mention what lies outside of the host yet.

It may be tempting to quickly draw conclusions and dive right into packet captures and even guest operating system troubleshooting, but knowing what questions to ask can make all the difference when trying to narrow down the problem. More often than not, it winds up being something quite simple.

When it comes to troubleshooting virtual machine network connectivity, the best place to start is to simply gather information relating to the problem – what works and what doesn’t. Just because you can’t ping something does not mean the virtual machine is completely isolated. It’s always best not to draw conclusions and to be as logical and methodical as possible. Get a better view of the whole picture, and then narrow down in areas where it makes sense.

Some initial discovery troubleshooting I’d recommend:

  • Is this impacting all virtual machines, or a subset of virtual machines?
  • If you move this VM to another host using vMotion or Cold Migration, does the issue persist?
  • Is there anything in common between the virtual machines having a problem? I.e. are they all in a specific VLAN?
  • Are the virtual machines able to communicate with each other on the same vSwitch and Port Group? Or is this a complete loss of network connectivity to the troubled VMs?
  • Have you been able to regain connectivity by doing anything? Or is it persistent?

Let me walk you through an example of a common problem in a simple vSphere 5.0 environment utilizing vNetwork Standard Switches. In this example, we have four Ubuntu Server virtual machines on a single ESXi 5.0 host. Three of them are not exhibiting any problems, but for some unknown reason, one of them, ubuntu1, appears to have no network connectivity.

ubuntu network

First, we’ll want to have a quick look at the host’s network configuration from the vSphere Client. As you can see below, we have four virtual machines spread across two VLANs – half of them in VLAN 5 and the other half in VLAN 6. Two gigabit network adapters are being used as uplinks in this vSwitch.

Based on this simple vSwitch depiction, we can draw several conclusions that may come in handy during our troubleshooting. First, we can see that VLAN IDs are specified for the two port groups. This indicates that we are doing VST or Virtual Switch Tagging. For this to work correctly, vmnic1 and vmnic2 in the network team must be configured for VLAN trunking using the 802.1q protocol on the physical switch. We’ll keep this information in the back of our heads for now and move on.

Next, since we have a pair of network adapters in the team, we need to determine the load balancing type employed (this could change the troubleshooting approach quite a bit, so you’ll want to know this right up front). As seen below, we were able to determine that the default load balancing method – Route based on the originating virtual port ID – is being used based on the vSwitch properties.

With this, we know several things about the way the network should be configured.
From a virtual switch perspective, we know that we’ll have a spread of virtual machines across both network adapters as both of the uplinks are in an Active state. We’ll have a one-to-one mapping of virtual machine network adapters to physical adapters. Assuming each virtual machine has only a single virtual network card, each virtual machine will be bound to a single physical adapter on the ESXi host. In theory, we should have a rough 50/50 split of virtual machines across both physical adapters in the team. It may not be a perfect 50/50 split across all VMs, but any single VM on this vSwitch has an equal chance of being on vmnic1 or vmnic2.

From the physical switch, each of these vmnics should be connecting to independent and identically configured 802.1q VLAN trunk ports – this is important. With the default ‘Route based on originating virtual port ID’ load balancing type, the physical switch should not be configured for link aggregation (802.3ad, also known as etherchannel in the Cisco world, or Trunks in the HP world). Link aggregation or bonding should be used only with IP hash load balancing type.

With this small amount of information, we are able to determine quite a bit about the way the environment should be configured in both ESXi and on the physical switch, and how the load balancing should be behaving.

To recap:

  1. Virtual machines vNICs will be bound in a one to one mapping with physical network adapters on the host.
  2. We are doing virtual switch tagging, so the virtual machines should not have any VLANs configured within the guest operating system.
  3. On the physical switch, vmnic1 and vmnic2 should connect to two independent switch ports configured identically as 802.1q VLAN trunks.
  4. Each physical switch port must be configured to allow both VLAN 5 and VLAN 6 as we are tagging both of these VLANs within the vSwitch.
  5. An etherchannel or port-channel should not be configured for these two switch ports.

From here, we should now try to determine more about the virtual machine’s problem. We’ll start by doing some basic ICMP ping testing between various devices we know to be online:

From ubuntu1:

  • Pinging the default gateway: Fails.
  • Pinging another virtual machine in VLAN5 on another ESXi host: Fails.
  • Pinging Ubuntu2 on the same host: Fails.
  • Pinging Ubuntu4 on the same host: Succeeds.

This very simple test provides us with a wealth of additional information that will help us narrow things down. From the five pings we just ran, we can determine the following:

Pinging the default gateway fails

Because other VMs, including Ubuntu4 can ping the default gateway, we know that it is responding to ICMP requests. A VM’s default gateway should always be in the same subnet/VLAN, so in this case, the core switch’s interface being pinged is also in VLAN 5, so no routing is being done to access it. If the VM can’t communicate with its gateway, we know that anything outside of VLAN5 will not be able to communicate with it due to loss of routing capability.

Pinging another virtual machine in VLAN5 on another ESXi host

This confirms that the VM is unable to communicate with anything on the physical network and that it is not just a VM to gateway communication problem. Since this target VM being pinged is also in VLAN 5, no routing is required and this appears to be a lower-level problem.

Pinging Ubuntu2 on the same host

Even though these two VMs are on the same host, they are in different VLANs, and different IP subnets. vSwitches are layer-2 only and do not perform any routing. Communication between VLANs on the same vSwitch requires routing and all traffic would have to go out to a router on the physical network for this to work. Since the ubuntu1 VM can’t reach it’s gateway on the physical network, this will obviously not work. Be careful when pinging between VMs on the same host – if they are not in the same VLAN, routing will be required and all traffic between VMs will go out and back in via the physical adapters.

Pinging Ubuntu4 on the same host

Pinging ubuntu4 in the same VLAN, in the same IP subnet and in the same vSwitch works correctly. This piece of information is very useful because it confirms that the VM’s network stack in the guest operating system is indeed working to some extent. Because ubuntu1 and ubuntu4 share the same portgroup, all communication remains within the vSwitch and does not need to traverse the physical network.

But why is ubuntu1 not working correctly, and ubuntu4 is? This is a rhetorical question by the way. Let’s think back to what we saw earlier – we have two physical NICs, and four virtual machines employing ‘Route based on the originating virtual port ID’ load balancing. The ubuntu1 VM cannot communicate on the physical network, but can communicate on the vSwitch. Clearly something is preventing this virtual machine from communicating out to the physical network.

Let’s recap what we’ve learned to this point:

  • We know the VM’s guest operating system networking stack is working correctly
  • We know that the vSwitch is configured correctly, as ubuntu4 works fine and is in the same VLAN and Port Group.
  • The problem lies in ubuntu1’s inability to communicate out to the physical network.

When thinking about this problem from an end-to-end perspective, the next logical place to look would be the physical network adapters. We know that we have two adapters in a NIC team, but which NIC is ubuntu1 actually using? Which NIC is ubuntu4 using? To determine this, we’ll need to connect to the host using SSH or via the Local Tech Support Mode console and use a tool called esxtop. From within esxtop, we simply hit ‘n’ for the networking view and immediately obtain some very interesting information:

As you can see above, this view tells us exactly which physical network adapter each virtual machine is currently bound to. There is a very clear pattern that we can see almost immediately – ubuntu1 is the only virtual machine currently utilizing vmnic1. It is unlikely that this is a mere coincidence, but most vSphere administrators may not be able to approach their network teams without more information.

Assuming there really is a problem with vmnic1 or its associated physical switch port, we would expect the VM to regain connectivity if we forced it to use vmnic2 instead. Let’s confirm this theory.

The easiest way to accomplish this would be to create a temporary port group identical to VM VLAN 5, but configured to use only vmnic2. That way when we put ubuntu1’s virtual NIC into that port group, it will have no choice but to use vmnic2. In our example, I created a new port group called ‘vmnic2 forced’:

After creating this new temporary port group, you simply need to check the ‘Override switch failover order’ checkbox in the NIC Teaming tab and ensure that only the desired adapter is listed as Active. All other adapters should be moved down to Unused. In our case, vmnic2 is now the only active adapter associated with this port group called ‘vmnic2 forced’. We can now edit the settings of ubuntu1 and configure its virtual network card to connect to this new port group:

Immediately after making this change – even if the VM is up and running – we should see esxtop reflect the new configuration:

As you can see above, ubuntu1 is now forced on vmnic2, consistent with the other VMs that are not having a problem.

We can then repeat our ping tests:

From ubuntu1:

  • Pinging the default gateway: Success.
  • Pinging another virtual machine in VLAN5 on another ESXi host: Success.
  • Pinging Ubuntu2 on the same host: Success.
  • Pinging Ubuntu4 on the same host: Succeeds.

And there you have it. We’ve essentially proven that there is some kind of a problem communicating out of vmnic1. Moving further along the communication path, it would now be a good idea to examine the physical switch configuration for these two ports associated with vmnic1 and vmnic2. For those of you who work in a large corporation, I’d say you would have enough proof to go to your network administration team to present a case for investigation on the physical side at this point.

In this example, our ESXi 5.0 host is connected to a single Cisco 2960G gigabit switch and we’ll assume that we’ve been able to physically confirm that vmnic1 plugs into port g1/0/10 and that vmnic2 plugs into port g1/0/11. Finding and confirming the correct upstream physical switch ports is outside of the scope of this example today, but we’ll just assume we’re 100% certain.

Examining the configuration of these switch ports, we see the following:
 # sh run int g1/0/10
 switchport trunk encapsulation dot1q
 switchport mode trunk
 switchport trunk allowed vlan 6,7,8,9
 spanning-tree portfast trunk
 no shutdown
# sh run int g1/0/11
 switchport trunk encapsulation dot1q
 switchport mode trunk
 switchport trunk allowed vlan 5,6,7,8,9
 spanning-tree portfast trunk
 no shutdown

Even if you are not familiar with Cisco IOS commands and switch port configuration, it isn’t difficult to see that there is indeed a difference between g1/0/10 and g1/0/11. It appears that only g1/0/11 is configured to allow VLAN 5. So this problem may have gone unnoticed for some time as it would only impact virtual machines in VLAN 5, and there is about a 50/50 chance they would fall on vmnic1. To make matters worse, certain operations may actually cause a troubled VM to switch from vmnic1 to vmnic2, which can sometimes add to the confusion.

It is not uncommon that vSphere administrators try several things to get the VM to come online again. Some of the actions that may cause the VM to become bound to a different vmnic include:

  • vMotion to another host, and then back again.
  • Powering off, and then powering on the VM again.
  • Removing a virtual network card and then adding it back – even if it’s the same or different type.

Any of the above tasks give you about a 50% chance connectivity will be restored when dealing with two physical network cards. You may try one of the above actions and the problem persists – you may try it again and things start to work again. This will often give administrators the false assumption that it was just something quirky with the VM’s virtual network adapter, or a problem with ESXi itself. This is why taking a methodical approach is ideal as it will eventually lead you to the true cause. Remember, each physical switch port associated with a network team should always be configured identically.

Well, that’s it for today. I’m hoping to provide some other example scenarios that lead to other root problems, or ones that deal with different load balancing types. Thanks for reading.