Home Page Products VMware Cloud Foundation VMware Private AI

VMware Private AI Foundation with NVIDIA: Unlock AI with VMware Cloud Foundation 9.0

AI is transforming businesses

Artificial Intelligence (AI) has become a cornerstone of digital transformation across industries. The evolution of AI has taken a significant leap forward with the emergence of Generative AI (Gen AI).

Gartner® sees “Gen AI becoming a general-purpose technology with an impact as big as the invention of the steam engine, electricity, and the internet. Gartner predicts that Gen AI usage will greatly increase over the next 3 years*. 

  • By 2028, 95% of organizations will have integrated Gen AI into daily operations, up from 15% in 2025. 
  • Multimodality will be a table-stakes capability by 2028.
  • The future of generative applications is domain-specialized, agentic, and multimodal.”

VMware Private AI Foundation with NVIDIA- Addressing the complexities of AI

Enterprises face significant obstacles when implementing AI, with privacy being a paramount concern. Concerns revolve around protecting intellectual property, data, and model access. Training AI models on public or cloud platforms might inadvertently incorporate sensitive proprietary data, conflicting with privacy regulations and intellectual property laws.

Beyond privacy, selecting the right Large Language Model (LLM) is critical for meeting specific use cases, industry requirements, and organizational goals. The AI landscape is complex and rapidly evolving, with constant introductions of new vendors, software, and components. This complexity drives up costs and presents performance challenges.

AI and Generative AI (Gen AI) require substantial infrastructure, and tasks like fine-tuning, customization, deployment, and querying can strain resources. Scaling up these operations becomes problematic without adequate infrastructure. Additionally, diverse compliance and legal requirements must be met across various industries and countries. Gen AI solutions must ensure access control, proper workload placement, and audit readiness to comply with these standards. To address these challenges, Broadcom and NVIDIA offer a joint AI platform, called VMware Private AI Foundation with NVIDIA. By combining innovations from both companies, Broadcom and NVIDIA aim to unlock the power of AI and unleash productivity with lower total cost of ownership (TCO).

This platform enables enterprises to fine-tune large language models (LLMs), deploy retrieval-augmented generation (RAG) workflows, and run inference workloads within their data centers, addressing concerns related to privacy, choice, cost, performance, and compliance.
VMware Private AI Foundation with NVIDIA simplifies AI deployments for enterprises by offering capabilities such as secure model delivery through Model Store and model deployment & scalability through Model Runtime. It also enables AI workloads to be deployed in air-gapped environments, provides easy-to-use microservices through NVIDIA NIM™, and offers reference AI workflows pre-trained for specific use cases through NVIDIA Blueprints.

Architecture of the Platform

Let’s get into the details of the platform architecture.

Built and run on the industry-leading private cloud platform, VMware Cloud Foundation (VCF), VMware Private AI Foundation with NVIDIA includes the Private AI Package, NVIDIA AI Enterprise, NVIDIA NIM (included with NVIDIA AI Enterprise), NVIDIA LLMs, and provides access to other community models (such as models from Hugging Face, and other third parties). VCF, VMware’s full-stack private cloud infrastructure solution, offers a secure, comprehensive, and scalable platform for building and operating AI workloads, providing organizations with agility, flexibility, and scalability to meet their evolving business needs.  The Private AI Package provides powerful capabilities like Vector Databases, Deep Learning VMs, Data Indexing and Retrieval Service, AI Agent Builder service, and more to enable privacy and security, simplify infrastructure management, and streamline model deployment.

Unlocking the Power of Innovation: The journey so far and why enterprises are choosing VMware Private AI Foundation with NVIDIA

VMware Private AI Foundation with NVIDIA has been highly successful, empowering enterprises to achieve data privacy and security goals while maximizing the full benefits of artificial intelligence. 

Let’s examine the specific factors that have contributed to its effectiveness.

  • Resource Sharing: AI workloads extend beyond just GPUs. In partnership with NVIDIA, the platform enables virtualized and shared GPUs (vGPUs). Infrastructure teams can then map allocated vGPUs to the appropriate capacity and manage necessary networks, data I/O, and CPUs. 
  • Speed and Agility: Deploying AI can be a complex and time-consuming process. VMware’s software automates the reservation of GPU, memory, network, and storage capacity for AI applications, and streamlines the deployment of network and security policies. This reduces deployment time and provides agility, enabling quick transitions to new models and services while maintaining cost efficiency through the pooled and shared use of AI infrastructure resources.
  • Consistent Operations: The platform utilizes the same tools and processes for both AI and non-AI workloads, eliminating the need for a separate management and operations stack for AI. This contributes to a lower TCO.
  • Lower TCO: The combination of resource sharing, unified architecture, speed, and agility, and consistent operations results in a significantly lower TCO. Fewer tools and processes, along with intelligent sharing of AI infrastructure, reduce costs. Broadcom’s per-core pricing model effectively reduces and stabilizes expenses by steering clear of token-based billing models, unlike most cloud providers. The platform achieves up to 90% lower total cost of ownership (TCO) compared to public clouds and up to 29% lower TCO compared to bare metal solutions, without compromising performance.  The recently released benchmark study, compared against bare metal using MLPerf Inference v5.0 standards, shows performance similar to bare metal. Hence, putting AI workloads on virtualized solutions preserves performance while adding the benefits of virtualization, such as ease of management and enterprise-grade security.

Unleashing AI with VCF 9.0- Exciting New Capabilities

Today, we are happy to announce the new generally available release of VMware Private AI Foundation with NVIDIA, simultaneously with VCF 9.0.

Watch this theCUBE video where Mark Chuang and Himanshu Singh discuss this release.

Want to know even more details? Register and attend the VCF Webinar series episode on July 9th, 10AM PDT where Justin Murray and Shobhit Bhutani will share all the details including demos.

Let’s discuss the new capabilities in this release.

  1. Enable Privacy & Security of AI Models

VMware Private AI Foundation with NVIDIA’s architectural approach for AI services enables privacy and control of corporate data, and integrated security and management. Broadcom and NVIDIA’s partnership can help enterprises build and deploy private and secure AI models with integrated security capabilities in VCF and NVIDIA AI Enterprise. Here are the new capabilities in this category-

a. Air-Gapped Support: With this release, VMware Private AI Foundation with NVIDIA can now be deployed in air-gapped environments, supporting the business needs of customers while ensuring data confidentiality and isolation for their critical workloads. Air-gapping the most sensitive assets slashes cyber-risk exposure, maintains environment compliance, and safeguards revenue and reputation.

This capability is enabled through VCF Automation. AI infrastructure elements, such as Deep Learning VMs, extract software code and containers from a set of on-premises repositories and do not automatically connect to the internet. These repositories include downloaded containers and software libraries from the NGC catalog. Only the IT admin can refresh the repositories on demand. This set of repositories, along with the enterprise data being isolated from the internet, enables air-gapped support.

b. Multitenancy: With this new release, cloud service providers and enterprises can now deploy private and independent AI environments for tenants. 

This is enabled by creating single or multiple organizations tailored to specific business needs. Each organization operates within a dedicated, isolated environment, ensuring security and autonomy while optimizing costs. Admins can configure network setups, establish default transit gateways and VPCs for each tenant, manage resources, set quotas, control permissions, and offer managed services such as encryption and backup. Additionally, they can monitor resource usage and spending across tenants to ensure optimal performance and cost management.

2. Simplify Infrastructure Management

AI models are complex and costly to architect, as they rapidly evolve with new vendors, SaaS components, and bleeding-edge AI software being continuously launched and deployed. In this complex environment, VMware Private AI Foundation with NVIDIA comes with specially architected capabilities that help simplify infrastructure management of AI environments and optimize costs. Leveraging Broadcom and NVIDIA’s extensive joint expertise and strong partnerships with industry leaders in this domain, enterprises can be assured of a simplified deployment experience. Let’s examine the functionalities within this category. 

In earlier releases, we introduced several GPU monitoring capabilities and dashboards at the host, cluster, and virtual machine (VM) levels. Today, we’re expanding these capabilities with additional powerful capabilities to monitor and improve utilization at a GPU level. These updates help effectively manage over-provisioning or underutilization of GPUs, optimize TCO, speed up issue resolution, and enable higher performance. Let’s break it down further. 

GPU Management improvements

  • GPU Slowdown Temperature— This functionality mitigates potential thermal damage to a GPU by dynamically reducing its clock speed upon exceeding a predefined temperature threshold, thereby regulating performance.
  • GPU Shutdown Temperature— This capability shuts down the maximum preset temperature at which the GPU shuts down (higher temperature than the slowdown temperature) to prevent heat-related damage.

vGPU Monitoring Improvements

  • vGPU Memory Reservation- Reserve GPU memory allocation for a vGPU profile.
  • vGPU Memory Usage- Monitor GPU memory used by vGPU profiles.
  • vGPU Compute Usage – Monitor compute usage at the vGPU level.
  • vGPU Encode Utilization-  Observe GPU utilization during the encoding process for vGPU for video and vision ML workloads.
  • vGPU Decode Utilization- Track GPU utilization during the decoding process for vGPU for video and vision ML workloads.

This capability enables administrators to view all vGPUs across their GPU footprint via DirectPath Profiles through an intuitive UI screen in vCenter, eliminating the need for manual tracking of vGPUs, reducing administrative time, and improving efficiency.

3. Streamline Model Deployment

With this release, Broadcom has introduced several groundbreaking capabilities designed to simplify and accelerate model deployment for data scientists and MLOps professionals. The following capabilities are included within this category- 

a. Model Runtime 

The Model Runtime service enables data scientists to create and manage model endpoints for their applications. Key benefits include the following.

  • Simplify Model usage- These model endpoints abstract away the complexity of the model instance. Users or systems that need to make predictions don’t need to know how the model works internally; they simply need to send the correct input to the endpoint and receive the output.
  • Scalability– Model endpoints enable scalable deployment. Instead of running the model locally for each request (which can be resource-intensive), it can be deployed on a server that can handle multiple requests concurrently. 

b. API Gateway

Organizations face several challenges when integrating large language models (LLMs) into their applications. Without secure access controls, endpoints may be exposed to unauthorized use, increasing security risks. As LLMs evolve, frequent API changes can disrupt integrations. Scalability becomes a concern as usage grows, with infrastructure struggling to handle concurrent users or traffic spikes without intelligent load balancing. Additionally, varying APIs across LLM providers complicate integration efforts, slowing down development and hindering standardization across platforms.

API Gateway offers a secure, stable, and scalable interface for accessing large language model (LLM) endpoints, enabling seamless integration and consistent performance. It enforces robust authentication and authorization, ensuring only trusted users and applications can access models. With built-in compatibility with the OpenAI API, it simplifies standardization across applications. The API gateway abstracts underlying model changes, providing a consistent API and operational flexibility. It also supports load balancing and resource scaling, allowing multiple model instances or inference servers to run transparently, ensuring high availability and performance without impacting clients.

c. Agent Builder Service

AI agents are autonomous or semi-autonomous software entities that use AI techniques to perceive, make decisions, take actions, and achieve goals in their digital or physical environments. AI agents are increasingly being integrated into Gen AI applications, enhancing their capabilities and enabling a wide range of creative and functional tasks. The Agent Builder Service enables GenAI application developers to create AI agents using the Model Store, Model Runtime, and Data Indexing and Retrieval Service. 

d. Data Indexing and Retrieval Service 

This service allows enterprises to chunk and index private data sources (e.g., PDFs, CSVs, PPTs, Microsoft Office docs, internal web or wiki pages) and vectorize the data. This vectorized data is made available through knowledge bases. As data changes, these knowledge bases can be updated on a schedule or on demand, ensuring that Gen AI applications can access the latest data. This capability reduces deployment time, simplifies data preparation, and improves Gen AI output quality for data scientists and ML Ops teams.

Watch this theCUBE video to learn more about this release.

Want to learn even more?


Ready to get started on your AI and ML journey? Check out these helpful resources:

*These forecasts were articulated by Danielle Casey, Director Analyst at the Gartner Tech Growth & Innovation Conference 2025. Gartner, Where Generative AI Works, Will Soon Work and Might Never Work, Danielle Casey, May 2025. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.