Last week I took a look at some common network teaming problems and how to apply logical troubleshooting methodology to zero in on the problem. I’m hoping to write some follow-ups to dig a bit more into other load balancing types, but for now, I’ll be shifting the focus to host routing.
When I talk about host routing, I’m referring to just that – the host. Virtual machines route based on their guest configuration and based on what they can access on the physical network by means of their vSwitch and the upstream network configuration. Running virtual machines is certainly the primary focus of the vSphere suite of products, but there are numerous services used by the host that will require network connectivity. These include but are not limited to:
- Management Networking and vCenter connectivity
- Fault Tolerance
- Host storage connectivity (iSCSI, NFS)
ESXi differs in many ways from ESX – namely in the absence of the bolt-on Service Console that was used for management purposes. I won’t be going to too much detail to describe the differences between ESX and ESXi, as this has been covered at great length in many blog posts and technical documents out there. Rather, I will focus on host routing, how it differs between ESX and ESXi and how to avoid common pitfalls.
I’m going to begin by walking through a problem I’ve seen on more than one occasion here in Tech Support at VMware Support. Here is a fictitious problem statement that we’ll be operating off of today:
I’ve been running ESX 4.1 without issue, and I just recently deployed a new ESXi 4.1 host. The vSwitch and physical network configuration is identical, but I can’t mount my NFS datastore on the new ESXi host. I get a “Permission Denied” failure. I’ve triple-checked my NFS filer permissions and everything is configured exactly as it should be. Help!
You may be wondering just what NFS permission problems have to do with host routing? More often than not, it really is just a simple permissions problem on the NFS filer. For the purposes of our example today though, we’ll be assuming that we’ve already verified that the permissions are indeed correct.
We’ll be working with two hosts – one ESX and one ESXi – that are both version 4.1 Update 1. Let’s start by taking a look at the ESX host that has been working just fine. First, we’ll check the NFS mount in question to determine how it’s been mounted and the relevant connection information. As you can see, this is a read-only NFS mount called ISOs. The NFS server’s IP address is 192.168.2.55. We’re able to browse the NFS datastore just fine, and if I remove the mount and mount it again, it connects back without issue on this ESX host.
Now let’s take a quick look at the vSwitch configuration on this same host:
As you can see above, the setup is very simple – vSwitch1 is being used for NFS purposes. We have a single VMkernel port for NFS with an IP address of 192.168.2.121. The Service Console port is utilizing vSwitch0, with a separate physical uplink adapter.
Everything may appear in order here, but the most important point we’ll want to make note of is that the Service Console port appears to be in the same IP subnet as the NFS VMkernel port. This is not according to best practices. Storage networking should always be in a dedicated subnet associated with a non-routable VLAN or a dedicated physical switch. Best practices aside, there should be no reason why this NFS share can’t be mounted so long as the host networking and upstream physical network are configured correctly.
Now let’s take a look at the ESXi host that is having difficulty. The vSwitch is configured as follows:
At first glance, the configuration certainly does seem consistent. The same physical uplink adapter configuration was maintained and the same IP subnet is being used for management and NFS connectivity. To begin our troubleshooting, let’s try to mount the NFS share and check the VMkernel logging see the exact error:
Pro-Tip: In ESXi 4.x, the VMkernel logging is written to the /var/log/messages file, along with logging from a slew of other chatty services, including hostd and vpxa. To filter only VMkernel messages in ESXi 4.x into the ‘less’ viewer, you can use the following command:
# grep vmkernel /var/log/messages | less
As you can see above, this is indeed specifically being reported as a ‘permission denied’ error. The ESXi host is receiving an error 13 back from the NFS filer. There really doesn’t appear to be an obvious networking problem as we receive back the error 13 after our mount attempt and can ping 192.168.2.55 all day long without a single dropped packet. We can also ping the host without issue from the NFS filer as well.
So what exactly is going on here?
Looking back to our ESX host, we can recall that the Service Console and NFS VMkernel port were both in the 192.168.2.0/24 network. Our ESXi host doesn’t have a Service Console port. Instead, it uses a VMkernel port for management networking. Again, these two VMkernel ports are in the same 192.168.2.0/24 network. But how does the ESXi host know which VMkernel port to use for NFS client purposes? Short answer – it doesn’t.
The host selects only one VMkernel port to use when there are more than one in any directly connected network. Generally speaking, this will be the lowest numbered or first created VMkernel port. So if we have vmk0 and vmk1 in the same subnet, the host will usually elect vmk0 for all communications within that network.
Let’s test our theory by checking the VMkernel routing table. We do this by running the following two commands:
# esxcfg-route –l
# esxcfg-route -n
The -l option provides a simple routing table, and the -n (for neighbor list) gives you what is essentially an ARP cache for various MAC addresses, along with the VMkernel interface currently being used to reach it. As you can see above, the host’s routing table doesn’t even include the NFS VMkernel port, vmk1. Only the management kernel port is listed for the directly connected 192.168.2.0/24 network. So to put it simply, the host is passing all traffic to 192.168.2.55 – the NFS server – through vmk0. We’re getting an access denied message because the NFS server is configured to allow vmk1’s IP address – not vmk0’s management IP address.
We can even confirm this by doing a packet capture on vmk1 using the tcpdump-uw utility in ESXi. We used the following command to check for traffic destined to 192.168.2.55 (the NFS server) on vmk1:
# tcpdump-uw -i vmk1 dst 192.168.2.55
Based on our testing, we received no tcpdump-uw output at all for vmk1 when trying to mount the NFS share. If we repeat this with vmk0, we get a flurry of outgoing traffic destined to 192.168.2.55.
The next logical question you may ask is why did this occur in ESXi but not in ESX? The answer is quite simple – The host’s NFS client must use a VMkernel port for communication and on the ESX host, we have only one VMkernel port in the 192.168.2.0 network. In short, it has no other choice. The Service Console has an independent routing table and even if its IP address happens to be in the same IP subnet, the Service Console and VMkernel will make independent forwarding decisions based on their own tables.
As you can see below, we can view the independent VMkernel and Service Console routing tables on the ESX host using the esxcfg-route -l and route -n commands respectively.
You may have also noticed that there are two ping commands in ESX – ping and vmkping. As you have probably guessed, ping is used when you want to send ICMP echo requests out of a Service Console interface, and vmkping for a VMkernel interface. In ESXi, both commands still exist, but they both are used for VMkernel interfaces only and can be used interchangeably.
What about vMotion, iSCSI?
You may be wondering about vMotion, Fault Tolerance and iSCSI traffic via VMkernel ports. Can’t you just check off the appropriate checkbox in the VMkernel port properties to ensure the correct one is used for each service?
And you would be correct in your assertion – you can indeed select which VMkernel port to use for these services. Even if there are other VMkernel ports in the same subnet, the host will respect the choices you make via these checkboxes:
Although this certainly helps, having multiple VMkernel ports in the same subnet can still cause confusion – even with these options checked off.
Take for example vMotion. If a host has a VMkernel port for vMotion and another for Management in the same subnet, the host will forward vMotion traffic based on the VMkernel port checked off for vMotion. That’s great and all, but now let’s suppose that a network administrator accidentally reconfigured the wrong physical port on the switch during some maintenance and the vmnic used for vMotion in now communicating in the wrong upstream VLAN. Suddenly, live migrations are failing to and from this host.
The first logical troubleshooting step you may wish to do is to login to this host via SSH, and to try to ping the vMotion interfaces on other hosts. Surprisingly, you may find that you are able to ping all of the other vMotion interfaces without issue. This is because the ‘ping’ utility is not respecting your choice of vMotion interfaces – it is simply forwarding traffic based on the host’s routing table.
There are similar challenges with iSCSI. You can bind VMkernel interfaces for use with the software iSCSI initiator, and the host will indeed respect this binding. Again, ICMP does not. Some iSCSI storage arrays use ICMP ping echo requests/replies as heartbeats to ensure the host interfaces are up. If these echo replies are not being received from the expected interfaces it can lead to all sorts of problems. I won’t get into any more detail on this situation, as this alone could be a full-length blog post. In a nutshell, it can cause unexpected behavior should an interface associated with active iSCSI paths go down on the host.
What’s New in ESXi 5.1?
Most of what we looked at was relating to ESX/ESXi 4.1, but it applies as well to ESXi 5.0. Things do change slightly in ESXi 5.1 as some changes were introduced to both the behavior of ICMP as well as ping or vmkping utility available in the busybox shell.
First, I’d recommend taking a look at KB 2042189 that discusses the ICMP ping response behavior change in ESXi 5.1. In summary, ICMP echo replies are now only sent back via the interface they were received on. The host will no longer refer to its routing table to make these forwarding decisions for ICMP traffic. If you think back to our iSCSI example earlier, you can see how this would be ideal to address the heartbeat challenges relating to ICMP echo requests being sent from the SAN.
Second, we can now select which VMkernel interface to ping out of using the ‘ping’ or ‘vmkping’ command in ESXi 5.1 using the -I option. Previously we could do this only for IPv6. This can help greatly if you find yourself in a situation with multiple VMkernel ports per subnet – especially when dealing with iSCSI where you want to test connectivity across multiple independent paths via two or more VMkernel ports.
What should I do?
How can you ensure that your traffic is utilizing the intended VMkernel interface? Unfortunately, there isn’t a special command or advanced setting that can be used to intervene in the host’s forwarding decisions in all scenarios. The best advice that can be given is to simply adhere to best practices. The relevant best practices are as follows:
- Have only one VMkernel port per IP subnet (the only exception here is for iSCSI multi-pathing, or multi-NIC vMotion in vSphere 5.x).
- A dedicated non-routable VLAN or dedicated physical switch for vMotion purposes.
- A dedicated non-routable VLAN or dedicated physical switch for IP Storage purposes.
- A dedicated non-routable VLAN or dedicated physical switch for Fault Tolerance purposes.
Not only will this ensure better segmentation of the various traffic types, it will also prevent the oddities I described earlier. Unfortunately, implementing these best practices in an existing environment may be much easier said than done. Based on our original NFS example, the configuration changes on the host would actually be quite simple – just a few IP address changes, but the physical infrastructure may be a different story altogether. You’d need to ensure that a VLAN was created, physical switches configured, along with readdressing of the NFS filer itself. Your network and storage administrators may not be too happy with you after you make a request like this, which is why it’s so important to configure your environment correctly from the beginning. Remember – VLANs are your friend.
Now let’s have a look at a different ESXi host – this one is configured to have services spread across various subnets/VLANs:
- Management: vmk0 in the 192.168.2.0/24 subnet.
- NFS: vmk1 in the 192.168.3.0/24 subnet.
- vMotion: vmk2 in the 192.168.4.0/24 subnet.
On the physical switch, NFS and vMotion are configured for dedicated non-routable VLANs. On this host, the vSwitch configuration looks as follows:
As you can see above, we now have one VMkernel port per IP subnet and a neat and tidy host routing table:
Now each kernel port is associated with a separate ‘Local Subnet’. Everything else routes via the default gateway associated with the management VMkernel port as it should.
And there you have it – if you don’t give the kernel any other choice – only one choice – it’ll always do what is expected.
Some topical reading for you from the VMware Knowledge Base: