In this post I am going to describe how VTEPs learn about the virtual machines connected to the logical Layer 2 networks. The learning process is quite similar to a transparent bridge function. As transparent bridges learn based on the packets received on the bridge ports, the VTEP also learn based on the inner and outer header of the packets received.
Let’s take an example to illustrate the VTEP learning process.
As shown in the diagram above there are two Hosts (Host1, Host 2) on which VTEPs are configured, and each host has one virtual machine connected to logical layer 2 network, identified as VXLAN 5001. Both the virtual machines are powered on and both VTEPs have joined the multicast group 239.1.1.100. Each VTEP has its own forwarding table, which is initially empty as shown in the diagram below.
How do the forwarding tables get populated?
We will take an example of virtual machine on Host 1 trying to communicate with the virtual machine on the Host 2. First, an ARP request is sent from the virtual machine MAC1 to find the MAC address of the virtual machine on Host 2. The ARP request is a broadcast packet.
The diagram above shows the packet flow:
- Virtual machine on Host1 sends ARP packet with Destination MAC as “FFFFFFFFFFF”
- VTEP on Host 1 encapsulates the Ethernet broadcast packet into a UDP header with Multicast address “239.1.1.100” as the destination IP address and VTEP address “10.20.10.10” as the Source IP address.
- The physical network delivers the multicast packet to the hosts that joined the multicast group address “239.1.1.10”.
- The VTEP on Host 2 receives the encapsulated packet. Based on the outer and inner header, it makes an entry in the forwarding table that shows the mapping of the virtual machine MAC address and the VTEP. In this example, the virtual machine MAC1 running on Host 1 is associated with VTEP IP “10.20.10.10”. VTEP also checks the segment ID or VXLAN logical network ID (5001) in the external header to decide if the packet has to be delivered on the host or not.
- The packet is de-encapsulated and delivered to the virtual machine connected on that logical network VXLAN 5001.
The entry in the forwarding table of Host 2 VTEP is used during lookup process. The packet flow shown in the diagram below explains the forwarding table lookup for a unicast packet sent from a virtual machine on Host2.
- Virtual Machine MAC2 on Host 2 responds to the ARP request by sending a unicast packet with Destination Ethernet MAC address as MAC1.
- After receiving the unicast packet, the VTEP on Host 2 performs a lookup in the forwarding table and gets a match for the destination MAC address “MAC1”. The VTEP now knows that to deliver the packet to virtual machine MAC1 it has to send it to VTEP with IP address “10.20.10.10”.
- The VTEP creates unicast packet with destination IP address as “10.20.10.10” and sends it out.
The Host1 VTEP receives the unicast packet and it also learns about the location of the virtual machine MAC2 as shown in the diagram below.
- The packet is delivered to Host1
- The VTEP on Host 1 receives the encapsulated packet. Based on the outer and inner header, it makes an entry in the forwarding table that shows the mapping of the virtual machine MAC address and the VTEP. In this example, the virtual machine MAC2 running on Host 2 is associated with VTEP IP “10.20.10.11”. The VTEP also checks segment ID or VXLAN logical network ID (5001) in the external header to decide if the packet has to be delivered on the host or not.
- The packet is de-encapsulated and delivered to the virtual machine connected on that logical network VXLAN 5001.
As you can see the forwarding table entries are populated based on the inner and outer header fields of the encapsulated packet. Similar to the transparent bridge the forwarding table entries are removed after aging timer expires. One of the common questions I get is what happens after a virtual machine is vMotioned.
In the next few posts I will cover how the forwarding table entries get modified after vMotion of a virtual machine from one host to another.
Here are the links to Part 1, Part 2, Part 3, Part 4.
Get notification of these blogs postings and more VMware Networking information by following me on Twitter: @VMWNetworking
acasanova
Great post Vyenkatesh.
I have a doubt. What happens if an unicast packet is sent from VM1 to VM2 and VTEP doesn’t have any entry for VM2? This would happen when the arp table timeout from VM1 Guest Os is greater than the VTEP entries aging time. Would the packet be sent as a multicast too?
Vyenkatesh Deshpande
Hi acasanova,
Glad you liked the post.
If the entry is timed out as you said the packet sent out will be multicast.
These are the scenarios when multicast packets will be used
1) Broadcast
2) Multicast
3) Unknown Unicast
The condition you described is the Unknown Unicast one and multicast packet will be used
Edmund Wee
Hi Vyenkatesh,
With regards to the unknown unicast scenario, is the multicast frame that was sent out an encapsulation of the original unicast packet (that was sent by the VM Guest OS) or an ARP request?
Tayfun DEGER
Hi,
Great post!
Thanks for sharing..
Srinivasa Acharya
I Like this article series on VXLAN.
In the step 2 in the last diagram, there is a typo. “The VTEP on Host2” should read as “The VTEP on Host1”
Irvin
PO
Vyenkatesh Deshpande
Thanks Srinivas for pointing out the error.
I have updated the post and also fixed some of the typos in the diagram.
CSI Lasvegas Season 11 DVD
I have a doubt. What happens if an unicast packet is sent from VM1 to VM2 and VTEP doesn’t have any entry for VM2?
Kaushal Patel
Thanks for G8 Presentation.
I have two doubts.
1. Is it possible to have one VM connected to two logical layer 2 network and if yes how will be VTEP table after learning?
2. In third pic from top. Let say if Host-2 receive multicast packet from Host-1 and Host-2 doesn’t have any VM on VXLAN 5001. Still VTEP on Host-2 will update forwarding table with VXLAN 5001 info?
samky2005
Great packet walk-through, Venky.
What if VMs are in the different VXLAN_ID? What L2/L3 infra switch can support that ? Or any recommendation?
Markus
Hi
just have to agree – great explanation and packet walk through but i would think that based on the recent NSX accouncements and the focus at VMworld on the fact that multicast will not be required anymore (is this understanding right?) it would be great to get such a packet walk through in the new VXLAN/STT NSX architecture as well
Regards
Markus
Vik
Really nice packet walk Venky. Thanks for making it so clear. I just had a question regarding the local vtep learning. If there are multiple virtual machine’s on a host and they all are associated to one VTEP would that VTEP need to keep track of the VMs and if so wouldn’t there be an entry in the local VTEP to ensure that on return the packets are not flooded to all the VMs. Tx
Murali reddy
Great explanation of VXLAN with packet flows.
Paraneetharan C
This is a great post Vyenkatesh. Thanks for writing.
I have a doubt. In the forwarding tables shown here, there is a column “Vtep mac”. And, it shown that the source Vtep’s mac shall be learnt on that table. I think this is with the assumption that both the Vteps are in same l2 network. But, if it were an l3 network(Vteps are connected to l3 routers), then the source mac of a packet received by a Vtep would not be the Mac of the originating Vtep, but it would be that of the next hop router. Am I missing something? Can you please clarify?
Yoni Shperling
Hello Vyenkatesh,
Indeed great post. I do have a few questions
According to http://blogs.vmware.com/vsphere/2013/05/vxlan-series-multicast-basics-part-2.html
When it comes to VXLAN, the multicast support requirements in the physical network are dictated by the number of transport VLAN used in the design. As mentioned in the last post, the transport VLAN carries VXLAN encapsulated traffic. If you are using a single transport VLAN then there is no need for multicast routing protocol (PIM). However, you need the following functions enabled on the switches and routers
– IGMP snooping on Layer 2 Switch
– IGMP Querier on the Router
If only one vlan will be used as transport then “there will be no need for multicast routing protocol (PIM)”
1. My question is, in which scenario I will need more than one vlan to transport the vxlan? Is it in two or more different physical datacenter that is separated by a physical boundary? Or if required more than 16M vxlan?
is it possible to deploy and use VXLAN without changing many configurations of the physical switch/routes like ip igmp snooping but also keep working with multicast transport instead of broadcast, as I know working without ip igmp snooping enabled mean that broadcast use be used instead of multicast.
2. I have a question regarding the multicast and broadcast in part 3, every broadcast will be encapsulated into a multicast? it means, the vxlan is more efficient than vlan not only regarding to the fact of the boundary of 4K and to the benefit of provision in a efficient way the vlan also in the transport right ?
I mean for example if four members VTEP are part of the same multicast group and there is broadcast then the broadcast will be encapsulated at the VTEP source level into a multicast frame and only the VTEP of the multicast group member will get the frame right by multicast transport ?
3. In a case that VTEP already is holding the arp VTEP mac, arp broadcast was already encapsulated into a multicast and the VTEP build the arp table.
unicast frame will be transmitted as unicast right?
Virtual machine (MAC1) on Host1 generates a broadcast frame.
VTEP on Host 1 encapsulates this broadcast frame into a UDP header with destination IP as multicast group address 239.1.1.100.
Physical network delivers the packet to the Host 4 VTEP, because it had joined the multicast group 239.1.1.100. The Host 2 and 3 VTEPs will not receive the broadcast packet.
VTEP on Host 4 first looks at the encapsulation header and if the 24-bit value of VXLAN identifier matches with the logical Layer 2 network ID, it removes the encapsulation header and delivers the packet to the virtual machine.
4. what are the downside of using multiple multicast group ?
Thank you again for blog, I hope to read more article like that regarding NSX and VXLAN.
Alan Harrylal
Vyenkatesh,
Another great write up.
Two questions:
1. When do you think you will have this article?: “how the forwarding table entries get modified after vMotion of a virtual machine from one host to another”
2. Since multicast is no longer needed, according to VMware, how does all of this change?
Thanks,
Alan Harrylal