Today at VMware Explore’s general session you saw Chris Wolf demonstrate Intelligent Assist for VMware Cloud Foundation, providing AI-powered assistance for our users. In this blog, we’ll take a step behind the curtain to see how these capabilities are running in VCF, using AI features that our customers can also use to build their own AI experiences with their own private data.
VMware Private AI services enable administrators to safely and securely import and share approved AI models (Model Gallery and Model Governance); scale and run Models as a Service for their organization (Model Runtime and ML API gateway); create Knowledge bases and regularly refresh data in a fully supported vector database for creating RAG applications (Data Indexing and Retrieval Service in partnership with Data Services Manager); and provide developers a UI where they can compose models, knowledge bases, and tools together to create Agents (Agent Builder).
The Intelligent Assist service is using these capabilities to run the Intelligent Assist agent, and VCF engineering teams are using these services as a common AI platform to deliver joint services and AI workflows.
Customers can also use these same capabilities for their own teams.

Model Gallery and Model Governance
These features give private cloud administrators what they need to safely download, validate, and share models with teams across their cloud. Learn about how to safely onboard popular models from upstream and ensure the model’s behavior meets your enterprises’ expectations and requirements – and behavior doesn’t drift over time in this blog post.
Model Runtime and ML API Gateway
Now that you have models securely imported and shared with the right folks in your organization, you will want to run them in an efficient and scalable way. Gone are the days of every division running their own separate copies of the same popular models – instead your team can provide Models as a Service using the Model Runtime. Deploy models on a fully maintained runtime stack from directly within VCF, and then horizontally scale them as they come under load with no end user impact, as users broker their requests via the ML API gateway. This also gives you flexibility to do rolling upgrades of models with zero end user impact. This method of deploying models allows separate lines of business or tenants within a Cloud Service Provider to keep their data separate from each other while ensuring high GPU utilization.
Data Indexing and Retrieval
With your development teams able to access Models as a Service to power their various workloads, the most popular GenAI application pattern in the enterprise today goes a step further: Retrieval Augmented Generation (RAG) applications. In this deployment pattern, you instruct your model to answer questions by searching your enterprise’s documentation, which you provide to the model by loading it in a vector database (and running an embedding model – which is fully supported by our model runtime).
However, in talking to customers we found that connecting to Data Sources (e.g., Confluence, Google Drive, SharePoint, S3), generating your document chunks and vector embeddings, storing that in a vector database, and then regularly refreshing your data to ensure the documents in your vector database are current was a big challenge. So we’ve created the Data Indexing and Retrieval Service, which provides data connectors for popular document repositories and enables you to set a document refresh policy according to how frequently that data changes.
Once Data Sources are configured, they can be combined into Knowledge Bases, which are consumable envelopes of documents within a vector database.
Additionally, Private AI Services includes entitlements to VCF’s Data Services Manager (DSM) to offer Database-as-a-service for postgres with pgvector. This gives you a built-in vector database ready to feed your RAG workloads.
Agent Builder
At this point, you’ve got Models as a Service and Knowledge Bases indexed and are ready to make ChatBots. The next step is enabling your users to build their AI experiences. Agent Builder is a one stop shop where those users can log in, see what models are available to them to use, and what Knowledge Bases are created for them. From there, they can compose AI Agents using Models, Knowledge Bases, and providing specific prompt instructions. A playground in the UI enables a quick development loop to try out different configurations, tools, models, and prompts. Once a user is happy with the result, they can save it and use the Agent they’ve created as a backend for their AI application.
Want to do this yourself? Here are the docs!
- Model Import Flow
- Run a model on Model Runtime
- Index a doc
- Compose in Agent Builder
- Stand up a front end (or connect it to your existing app to add new AI workflows)
If you have feedback or want to chat more, check out our Private AI booth, and these conference events!
Tuesday August 26
- Creating an AI platform of the future in the Mining Sector [INVB1153LV]
- 11:45 AM – 12:30 PM PDT
- Level 3, Lido 3103
- Unlock Innovation with VMware Private AI Foundation with NVIDIA [INVB1446LV]
- 1:00 PM – 1:45 PM PDT
- Level 3, Lido 3103
- Unlocking your data with Private AI – Retrieval Augmented Generation Deep Dive [INVB1070LV]
- 2:15 PM – 3:00 PM PDT
- Level 3, Lido 3103
- What’s New with VMware Private AI Foundation with NVIDIA [INVB1779LV]
- 3:30 PM – 4:15 PM PDT
- Level 3, Lido 3105
- Ask Me Anything About Private AI Foundation [CLOM1931LV]
- 4:00 PM – 4:30 PM PDT
- Meet the Experts, Level 2, Venetian Ballroom G, Table 2
Wednesday August 27
- Real-World Lessons in Rightsizing VMware Cloud Foundation for On-Premises AI Workloads [INVB1300LV]
- 10:15 AM – 11:00 AM PDT
- Level 2, Bellini 2003
- Building Secure Private AI Deep Dive [INVB1432LV]
- 2:00 PM – 2:45 PM PDT
- Level 3, Murano 3205
Discover more from VMware Cloud Foundation (VCF) Blog
Subscribe to get the latest posts sent to your email.