Home Page

AI with VCF 9.1 on AMD GPUs: Build with open frameworks and simplify management, at a lower TCO

Artificial Intelligence (AI) is rapidly emerging as one of the most transformational technologies of our time, with massive potential to redefine how organizations operate, compete, and innovate. With the ability to learn from vast amounts of data, identify complex patterns, and make intelligent decisions, AI is moving enterprises towards smarter, faster, and highly adaptive operations at-scale. However, enterprises face tremendous challenges in implementing AI – privacy, security, compliance, cost, and governance when deploying AI workloads.

At Explore Vegas 2025, Broadcom and AMD announced the expansion of our collaboration to advance AI for enterprises. With VCF on AMD Instinct GPUs, our partnership is addressing these enterprise challenges.

Announcement 

With the release of VMware Cloud Foundation 9.1, Broadcom and AMD are announcing the next step in furthering our mission of helping enterprises run and manage their AI workloads. Broadcom and AMD will add support for VCF on AMD Instinct MI350 Series GPUs. This new release will help enterprises run and manage AI workloads, unlocking powerful virtualization capabilities and reducing TCO.

Details of the Release

Let’s get into the details.

1. Exceptional Performance at Lower TCO (Total Cost of Ownership):

We are adding several capabilities to accelerate performance with low TCO for enterprises. Here are the capabilities in this category:

  • High-performance infrastructure powered by the AMD Instinct MI350 GPU series: The MI350 series delivers a 4x generational increase in AI compute over previous AMD GPUs. With up to 10 PetaFLOPS of FP4 and FP6 operations, these GPUs are optimized for highly efficient training and inferencing.
  • Larger-sized models on a single GPU: Equipped with 288 GB of HBM3E memory, the MI350 series GPUs can run models with up to 520 billion parameters on a single GPU. This reduces the number of GPUs required for deployment, significantly reducing infrastructure costs.
  • Hot-add and remove virtual hardware for maximizing model performance: Enterprises can scale AI workloads dynamically by adding or removing vCPUs, memory, storage, and network adapters to running VMs without downtime. Removing virtual hardware helps right-size the infrastructure, optimizing TCO seamlessly.

2. Simplified Infrastructure Management:

These capabilities streamline operations and ensure high availability:

  • vSphere High Availability for automated AI workload recovery: By pooling VMs and hosts into a cluster, AI apps running on VMs can be restarted on a different host if an issue occurs, reducing downtime.
  • ESX Live Patch for non-disruptive security updates for AI hosts: Administrators can perform ESX updates by applying critical security patches to running hosts in partial maintenance mode. This helps continuous operations of AI workloads by ensuring uninterrupted access to compute resources. 
  • Storage vMotion and Snapshots for ensuring continuous operations for AI workloads. Storage vMotion allows for the seamless migration of running VMs’ data to a different datastore, while storage snapshots preserve the exact state of virtual disks at specific points in time.

3. Choice of Open Standards and Frameworks

AMD and Broadcom have collaborated to build  an infrastructure on which AI workloads can be created using open standards and frameworks. This enhances flexibility while continuing to deliver the consistent operations experience that VCF provides. Let’s understand the details on this:

  • Industry Standard Frameworks, PyTorch and vLLM: We provide native support for PyTorch and vLLM, allowing enterprises to move workloads between accelerators with minimal code changes.
  • Standardized Architecture through OPEA (Open Platform for Enterprise AI): By leveraging OPEA, enterprises can build on vetted, industry-standard blueprints for RAG and generative AI.
  • Massive Quantity of Available Open-source models: Through the partnership with Hugging Face, over 1.8 million open-source models are available out-of-the-box, ready to be deployed on AMD GPUs.

Want to know more? 


Discover more from VMware Cloud Foundation (VCF) Blog

Subscribe to get the latest posts sent to your email.