In this blog post I’m going to dive into what makes up a “vSphere Pod”. If you read my previous blog on the vSphere Pod Service, I touched on the different components that make up the service. One of those components was the vSphere Pod itself. Let’s dive in!
What is a vSphere Pod? It’s always best to describe up front what you are talking about. In the previous blog I said the following:
The vSphere Pod Service provides a purpose-built lightweight Linux kernel that is responsible for running containers inside the guest.
Wait a second? ESXi? Linux? What’s really under the covers? Is ESXi Linux? (NO! IT IS NOT!) Maybe a graphic would be a good place to start.
The video shows an ESXi server.
Running on that is a VMX customized with code changes to support running containers. Running on the VMX is a Linux kernel provided by ESXi. Running on the kernel is a Container Engine. These three components together are referred to as a “CRX”.
Finally, one or more Containers are running on the Pod. Ok, what does this all mean to you? Is is a VM? Is it not a VM? Why is Linux there? Let’s go into details.
CRX stands for “Container Runtime for ESXi”. In a nutshell, it’s a virtual machine that is highly optimized to run a Linux kernel that itself is highly optimized to run containers. As mentioned above, the three components, the customized VMX, the Linux kernel and the Container Engine, are considered a “CRX. To understand what it does we need to look at VMX processes. If you look at the running processes in ESXi you’ll see several them marked as “VMX” processes. Here’s an example:
You’ll see that this VMX process is running the VM “mgt-dc-01.cpbu.lab”. Associated with that process are the 2 vCPU’s, the SVGA device and the MKS device used for things like VMRC or Web Console.
Just what parts of the VMX are “modified”? Well, some of those modifications are just default VMX settings like “svga.present = “FALSE””. Some more changes were code changes to support a CRX. Some of those changes to support a CRX are:
- Direct Boot support. Typically, a VM would boot an OS off a virtual disk (VMDK). This requires a BIOS. But we wanted to make the Pod not only boot faster but more like a container. So, the CRX required code changes such that it bypasses BIOS and executes the pre-loaded Linux kernel directly.
- A VM expects several things to be in place. A “home” directory on a datastore provides a place to write logs, VMX and VM swap files. It also needs an operating system to boot from, whether it’s a VMDK, ISO file or PXE boot. To make the CRX do the things we needed it to do we had to change some of these assumptions. That required some code changes to VMX.
- No CRX’s will receive unwanted drivers. A good example is that the CRX doesn’t need stuff like keyboards, mice and video. That’s now forbidden in the code.
Security built in
Now, I mentioned a pre-loaded Linux Kernel. Typically, that then begs the question many will ask “How do I update this kernel?”. Well, you don’t. Directly. The kernel used by the CRX to support containers is pre-packaged in a VIB and distributed as part of the ESXi distribution. Patches and updates to this kernel will come as part of an update/upgrade to ESXi. In fact, the CRX is hard-coded to only accept a Linux kernel that comes via a VIB file. That means you can’t modify the kernel. And for those of you that followed me through my seven years of supporting vSphere Security, you’ll be happy to know that this means the kernel is not only tamper proof but that if you enable Secure Boot and TPM 2.0 you can prove that your vSphere Pods are booting “clean”.
In addition to these very cool features there’s even more control put on a CRX. For example, when the VM is in “CRX Mode” we limit the changes that can be made to the configuration. Many VMware Tools operations are disabled. CRX Mode is a “hidden” GuestOS type. It’s not available via the UI or API. You can’t create a CRX via these methods “by hand”. When the VM (CRX) is set to the proper (hidden) GuestOS type then the appropriate settings and restrictions are enforced.
As I’ve mentioned before, this kernel is highly optimized. It includes a minimal set of drivers. You might say “Just Enough Kernel”. It uses para-virtualized devices like vSCSI and VMXNET3.
Once the kernel is “booted” it starts a minimal “init” process. This populates the /dev hierarchy and initializes the loopback network device. After that the application is started.
Operational Efficiencies and Security
Is it better to run five containers on one vSphere Pod or five separate vSphere Pods? Well, the answer, as you can imagine, is “it depends”. There are many design decisions to consider. One of those is security vs lifecycle management. Five containers in a pod are weakly isolated to each other by design and collaborate to provide a single service that’s lifecycle-managed as one entity.
On the opposite side of the spectrum is where security usually sits. They typically insist on many levels of isolation. If one of those containers has a bug, then you could potentially compromise the other containers.
Somewhere in the middle are the business requirements. If that one container is compromised then great, it’s isolated. But multiple containers are usually seen as one entity, a service. So while it’s isolated it’s still not “up”.
One of the advantages to running five containers on a pod is that if I give each container 1GB of memory (5GB total) then that memory can be “shared”. If one container needs more memory and the other containers haven’t consumed their allotment, then the memory is available for that container.
From a resource perspective, if we run five separate Pods, each with one container and 1GB of memory then if a container needs more memory it won’t have access to that shared memory pool. This could cause a bottleneck. However, the upside is that each container is running in its own Pod and a vSphere Pod IS a VM, so you gain the already proven isolation of virtual machines and NSX networking. Not to mention that there are already tools out there for monitoring VM performance bottlenecks.
These are some of the design tradeoffs that you and your development team will have to make. You may wish to try both scenarios to see which one meets your security and operational needs. vSphere with Kubernetes gives you those options.
Even More Security
Speaking of security and isolation, vSphere Pods really stand out here. Let’s review how containers work on bare metal today. See the following image:
In a bare meta environment all containers are running on a single kernel with a shared file system, shared storage, shared memory and shared networking. Your isolation is dependent on Linux kernel primitives in software.
Now, when you look at the image below, the use of vSphere Pods where if the business requirements mandated, you could use one container per Pod to provide the BEST isolation of CPU, Memory and Networking. You’re leveraging the already robust virtual machine isolation. You’re booting a Linux kernel that’s unique to the Pod (and not the same Linux kernel instance used by 100’s of containers on a Bare Metal install!) and you have the capability to use enterprise class networking isolation with VMware NSX. All with no performance penalty.
I mentioned performance above. In August of 2019 we posted a blog post on what was then called “Native Pod” performance. (Native Pod was a code name of sorts; the actual name IS now “vSphere Pod”) In this blog post one of our engineers, Karthik Ganesan and the PM for vSphere with Kubernetes, Jared Rosoff, showed how vSphere Pods have up to 8% better performance than bare metal Linux Kubernetes nodes. If you read through the blog post, you’ll see that much of this performance gain is due to the ESXi scheduler and the fact that vSphere Pods are independent entities. The scheduler does its best to ensure that each Pod is on a CPU that’s closest to the memory of that CPU. Read the blog post to get all the details! It’s fascinating work. You may ask if there’s any update since August. What I’ll say is that we are always working on optimizations. When we have more good news to announce we’ll do it here.
So, to wrap this up, the question you may be asking is “When do I use vSphere Pods?”. The answer to that, as I’m sure you can image, is, again, “It depends”. Let’s break it down to make it clearer.
- Do you need a fully upstream conformance Kubernetes environment? More details on the vSphere Pod Service and conformance will be coming soon in another blog article. This will help you make your decision.
- Do you require 3rd party integrations like Helm?
- Then you want to use TKG clusters running on vSphere with Kubernetes. This gives you the most flexibility. In this scenario the containers will run on standard VM’s. Today that VM is based on VMware Photon.
- Do you need absolute network, CPU, memory, filesystem isolation?
- Do you have an application that you’ve tested in a vSphere Pod and it works?
- Do you have performance requirements that are met by vSphere Pods?
Then I think you’ve answered your own question. The bottom line is that you have options. You have flexibility to run your applications where they run best.
For more information and guidance on when to use a vSphere Pod vs a Tanzu Kubernetes Cluster, please check out the documentation page on this subject. As with everything in this space, things move fast so these guidelines may change over time.
I hope this has been helpful to you. I only came into the Kubernetes world a short time ago and I’m really excited to see the changes happening in how systems will be managed in the very near future. I hope you’ll join me in this journey together. It’s going to be a fun ride!
If you have any ideas on vSphere with Kubernetes topics that you’d like to learn more about from a vSphere Administrator standpoint then please reach out to me on Twitter. I’m @mikefoley and my DM’s are open.
We are excited about vSphere 7 and what it means for our customers and the future. Watch the vSphere 7 Launch Event replay, an event designed for vSphere Admins, hosted by theCUBE. We will continue posting new technical and product information about vSphere 7 and vSphere with Kubernetes Monday through Thursdays into May 2020. Join us by following the blog directly using the RSS feed, on Facebook, and on Twitter. Thank you, and stay safe!