SD-WAN Technical

[video] VMware SD-WAN Per-Packet vs. Per-Flow Processing, Part 1

One of the most fundamental business outcomes for SD-WAN is the ability to improve application performance. In two videos, we explain important differences between per-packet and per-flow processing for VMware SD-WAN™. Per-packet provides bandwidth aggregation and application performance. Per-flow is useful because it can use all available transport circuits—but ultimately it does not improve application performance.

Watch the first video to learn more about per-packet vs. per-flow. The second video will outline the business outcomes enabled by better application performance. (If you don’t see the video, click for the YouTube link.)

To improve application performance, SD-WAN performs multiple techniques in the underlay. One of the most important is the ability to use all the transport circuits between the source and the destination. There are two ways to achieve that: bandwidth aggregation and link load sharing/load balancing. These terms are often used interchangeably, but they are not the same, and they do not provide the same application performance benefits.

Per-flow example

In the diagram above, there are two SD-WAN devices with two circuits between them. Circuit 1 (WAN1) is three times bigger than circuit 2 (WAN2). The per-flow SD-WAN device receives the flow, then selects one of the circuits for the entire flow. It does this for 3 flows (because it is 3 times bigger than circuit 2) before it starts using circuit 2.

This technique can use all the transport circuits between the source and the destination. However, the link selection for each flow happens only at the beginning of the flow. If the circuit’s characteristics change, the flows are stuck on the same circuit until the flow is over. This applies to a reduced-connectivity or dim-out scenario, not a no-connectivity scenario. With no connectivity, the other circuit would take over immediately. With reduced connectivity, the circuit is still up, but it no longer has the same characteristics that made it the best path before the dim-out condition.

This single flow also cannot use all the transport circuits. Load sharing and load balancing are possible, but application performance cannot improve because application performance is between a user and an application, and this is always one flow. Load sharing and load balancing are possible, but not equal load sharing and load balancing. The diagram shows unequal load sharing-load balancing. You can assume the flow in circuit 2 is very heavy, so it can take 90 percent of the circuit. The three other flows in circuit 1 can take around 10 percent of the circuit. 100 percent equal load sharing-load balancing cannot be guaranteed.

Most of the devices that use per-flow rely on synthetic data that they generate to discover the dim-out condition or the health of the link. This causes them to react slowly, because they have to wait to measure the circuits based on the synthetic data and how frequently they send that data.

Per-packet example

In a per-packet flow, when the SD-WAN devices receive the flow, they divide the flow onto all of the circuits per-packet and keep doing that for the entire flow. Subsequent flows will follow the same pattern. This technique also uses all the links between the source and destination. However, link selection happens at each packet, and this allows fast reaction to change in link conditions because these devices generally rely on actual data to measure link conditions. This lets the device discover dim-out conditions faster and move the next packet to the other circuit immediately. The link upload and download are handled independently, and in this case the sending device can take a different decision to send the upload file than the receiving device when sending the download file on the same circuit. For example, a circuit might look good for the upload from one direction and not good from the other direction, and this is an advantage.

One single flow that improves application performance can use all available transports, providing true bandwidth aggregation. This does not happen with the per-flow mechanism, only with per-packet. One flow can use all the transport circuits and be seen by the application as a link with BW equals the sum of the bandwidth for all circuits combined. This is the only way to increase application throughput and enhance performance.

Learn more