By Shudong Zhou, Sr. Staff Engineer, ESX Networking
From time to time, I got queries about VDS architecture, particularly about how vswitches are implemented in the kernel. Do different vswitches share the same code? What’s the object lay out? A few email exchanges later, it was clear that the questioners really wanted an assessment of the security risk of VDS, and they thought they could get it from a raw description of VDS implementation architecture!
You can get an idea of VDS security from experts whose job is to assess security risk. One reference is that VI3 earned Common Criteria EAL4+ Certification. The vswitch data plane was thoroughly tested and the code was reviewed as part of that process. Granted that VDS didn’t exist in VI3, but VDS data plane follows the same architecture as in ESX 3. More recently, my colleagues and I worked with CESG, a UK government organization, to evaluate the security of VDS in the upcoming release of vSphere. I received a number of comments, and the most serious one was adding parenthesis around a macro in the source code.
Another way to look at the security risk of VDS is via statistics. Assuming there are 20 million VDS ports (you can get more accurate numbers from Gartner reports) running for a period for a year. Since we never issued a security patch in the VDS area, as far as I know, your chance of running into a security issue with VDS is less than 1 in 20 Million port-years. Suppose you run 1000 VMs for a year, the security risk due to VDS would be less than 1 in 20,000. I know that ports are not independent, etc., but this is just a rough estimate. In contrast, human errors are more probable. I consider myself a decent developer. Over the last 4+ years, I made 775 code checkins, out of which 11 were backouts. The error rate is about 1 in 70. This is a lower limit since not all errors resulted in a backout. I’m not a security expert, but I know that a system is as secure as the weakest link. I hope I convinced you that VDS is not the weakest link.
Some time back, a customer wanted to connect separate physical networks to a cluster of ESX hosts. They had a choice of using N1K or VDS. With VDS, you create one VDS instance per physical network, thus different networks are managed separately. With N1K, you have to connect all networks to a single N1K instance. There was a debate on whether the VDS approach is more secure. Someone from Cisco posted a blog on why there is no difference, after a long winding lecture on software architecture. The author totally missed the point. The weakest link is human error, and the VDS approach provides less chance for human error.