Webinar Recap: Design and Architecture Considerations for VMware vSphere Kubernetes Service on VMware Cloud Foundation

Gain design insights and considerations for VMware vSphere Kubernetes Service on VMware Cloud Foundation (VCF) from this recap of our recent webinar.

Modern applications require more than just a place to run; they demand a complete ecosystem to thrive. Today’s deployments go far beyond simple virtual machines or containers, requiring a broad suite of production-ready services—such as load balancers, persistent storage, registries, and integrated monitoring—to function effectively. Building a true cloud platform means moving past raw compute to create a unified environment where infrastructure and cloud-native services are seamlessly integrated. VCF delivers this comprehensive platform, enabling applications to consume the services they need consistently.

Recently, I co-hosted a webinar with Caleb Washburn, CTO at MomentumAI, where we detailed how to use VCF with VMware vSphere Supervisor to establish this integrated environment. By converging VMs, containers, and Kubernetes clusters into a single management plane controlled via a declarative API, you can automate lifecycle management and focus on workload outcomes rather than getting bogged down in manual, imperative tasks.

The Foundation: Transforming VMware ESXClusters into Kubernetes Clusters

At its core, enabling the vSphere Supervisor transforms a standard ESX cluster into a high-functioning Kubernetes cluster. This architectural shift redefines the roles of your existing infrastructure:

Worker Nodes: Each ESX host becomes a worker node responsible for running workloads.
The Control Plane: This is deployed as a set of VMs that handle all control plane operations and expose a Kubernetes-based declarative API.
Spherelet: A custom ESX-native implementation of a Kubelet is installed on every ESX host to manage communication.
Unified Management: This architecture allows consumers to manage the platform using familiar tools like kubectl.

Once enabled, the vSphere Supervisor delivers a suite of core services with independent lifecycles. These include the vSphere Kubernetes Service (VKS) for managing conformant clusters, a VM Service for managing virtual machines via a desired-state model, and essential add-ons like the Harbor Image Registry and secret stores.

Architectural Strategy: Availability Zone Design

A critical step in deploying vSphere Supervisor is determining the right availability zone (AZ) model. An availability zone represents a logical construct of an independent physical failure domain—such as a specific server room or rack—and typically maps directly to a single vSphere cluster.

The webinar highlighted four distinct deployment models, ranging from simple development setups to high-resilience enterprise configurations:

1. Single Management Zone with Combined Workload Zone

This is the most straightforward model, where the vSphere Supervisor is enabled directly on one vSphere zone.

Best Use: Development, testing, non-production environments, or production workloads that do not require site-level fault tolerance.
Advantages: It is fast to deploy, requires minimal hardware, and uses fewer licenses.
Be Aware: A cluster-level failure (power, network, or storage) impacts both the vSphere Supervisor and the workloads.

2. Single Management Zone with Isolated Workload Zones

In this model, the control plane stays in a dedicated management zone, while workloads are distributed across one or more separate zones.

Best Use: Medium-scale production environments requiring a clear separation between management and workload planes.
Advantages: Workloads can be distributed for higher availability while keeping the control plane simple.
Be Aware: There are increased hardware requirements and higher configuration effort for networking compared to the single management zone with combined workload zone.

3. Three Management Zones with Combined Workload Zones

This model distributes the vSphere Supervisor control plane VMs across three vSphere zones to ensure high availability and quorum.

Best Use: Production environments requiring strong resilience and the ability to tolerate a single cluster failure.
Advantages: Higher control plane variability and improved availability for workloads distributed across three zones.
Be Aware: There is less isolation between the control plane and workloads and higher costs.

4. Three Management Zones with Isolated Workload Zones

The most robust model features a highly available control plane across three dedicated management zones and workloads in additional isolated zones.

Best Use: Mission-critical production deployments demanding the highest levels of security, isolation, and availability.
Advantages: Maximum resilience and full support for advanced zone-aware Kubernetes features.
Be Aware: This deployment model has significant hardware and licensing requirements.

Networking: A Critical Upfront Design Choice

Networking decisions are foundational; choosing a stack during vSphere Supervisor enablement is a critical decision because changing it later often requires tearing down and rebuilding the vSphere Supervisor.

VDS Networking

The Virtual Distributed Switch (VDS) represents the traditional networking approach. It segments management and workload traffic using port groups and provides a flat, routable network space. It is the ideal choice for existing environments where teams are not yet ready for a full software-defined networking stack.

VMware NSX Virtual Private Cloud

The “latest and greatest” approach, NSX Virtual Private Cloud (VPC) networking, mirrors the experience of public cloud environments. It offers high-level isolation between tenants and is deeply integrated with VMware Cloud Foundation Automation for self-service provisioning. This is the recommended starting path for any new Kubernetes on VCF deployment.

NSX Segment Networking

This provides similar isolation to the VPC model but is managed through traditional NSX segment interactions rather than the VCF Automation self-service portal. It is typically favored by organizations already deeply familiar with managing NSX segments in their existing environments.

Load Balancing Options

When we look at load balancing options within VCF, we categorize them based on your chosen network stack to provide the right fit for your specific traffic requirements:

Foundation Load Balancer: A “batteries-included” Layer 4 balancer for VDS networking that replaces older HA Proxy solutions.
VMware Avi Load Balancer: The premium option providing Layer 4-7 capabilities, including web application firewall (WAF), certificate management, and advanced DNS.

NSX Classic Load Balancer: A virtualized Layer 4 stack provided out-of-the-box for NSX environments.

CNI Options

In Kubernetes cluster provisioning, selecting a Container Network Interface (CNI) is a per-cluster decision that offers flexibility compared to the more rigid underlying networking stack. The recommended default is Antrea, which is favored for its deep integration with the NSX stack via the Antrea-NSX Adapter, enabling advanced capabilities and compatibility with the vDefend security solution.

Alternatively, Project Calico is available as a supported option for users who prioritize a consistent CNI experience across diverse public or private cloud environments. While the choice of CNI can be deferred or decided on a cluster-by-cluster basis, it is distinct from the foundational vSphere Supervisor networking decisions—such as VDS or NSX VPC—which are permanent once enabled and require a complete teardown and rebuild to change.

Storage and Multi-Tenancy

Finally, we reviewed the logical boundaries used to manage resources: vSphere Namespaces. A namespace acts as an extension of a vSphere resource pool and maps directly to a Kubernetes namespace. It is through this construct that administrators achieve multi-tenancy, assigning quotas for storage, load balancers, and image registries to specific teams or products.

When it comes to storage, the vSphere Supervisor utilizes storage policies to enforce quotas and determine which data stores are accessible to workloads.

Read-Write-Once (RWO): Supported by most data stores and used for standard stateful applications.
Read-Write-Many (RWX): Currently requires vSAN file services and is necessary for pods that need to write to the same file store simultaneously.

By carefully considering these design elements—availability zones, networking pillars, and storage policies—you can build a private cloud that is not only resilient but also ready to scale with the demands of modern application development. And if you need assistance designing and building your solution, VCF Professional Services and MomentumAI can help. Just reach out to your Broadcom Account Manager to learn more.

If you missed it, watch our webinar replay now.

Discover more from VMware Cloud Foundation (VCF) Blog

Subscribe to get the latest posts sent to your email.

The Foundation: Transforming VMware ESX Clusters into Kubernetes Clusters