
Model Onboarding
A data scientist or LLM Ops engineer, supported with suitable infrastructure by the Operations people, chooses a model from the expanding set of models that are available in the public domain (e.g. the Llama 3 set of models from Meta, or an optimized model from the NVIDIA GPU Cloud). With this choice comes a need to test the model in a safe environment with specific company data, for its accuracy of answers, bias handling, vulnerabilities and performance characteristics among other considerations. This model choice and testing is needed for both for “completion” models like Llama 70B and for “embedding” models like BAAI BGE-M3. This testing is best done in an AI Workstation configured in the VMware Private AI Foundation tooling. A devops person or a data scientist provisions the “AI Workstation” tile seen below in VCF Automation and this results in a deep learning VM (DLVM) being created. All the technical details of doing that DLVM provisioning are handled by VCF Automation. The “AI Workstation” catalog item is seen on the far right in the VCF Automation – Catalog below. This is one example of the set of items that can be made available for deployment in VCF Automation.
Model Store
The data scientist decides to retain and store the tested model for further application use in their enterprise. This is done using a model store/repository that is implemented on an OCI repository called Harbor. The LLM Ops person provisions this model store and provides write access to the data scientist or other users for certain named projects and repositories within the model gallery. The newly tested model is pushed up from the DLVM to the Harbor-based Model Gallery using a VMware Private AI-specific command line interface called “pais”. That “pais” CLI comes as part of the deep learning VM.Model Endpoints via the Model Runtime Service
The data scientist, application developer or LLM testing engineer requires access to one or more models to serve their application. A completions model and an embedding model are shown as two examples here. Often, the requirement is for a model setup that provides answers to users’ questions delivered in a chatbot style. For building and testing this application, they need an access point to the model (simply put, an access point to the model) and a means of sending the model some input data and receiving responses. For this the user creates an end-point that is supported by the model runtime service, that then allows them to use the model. Below you see a model endpoint being created using a reference to the model itself in the Harbor model gallery. This reference is seen in the Model URL field given below.


Data Indexing and Retrieval Service
Private company data will be used by the completions model we deployed above to augment its behavior when answering an end-user’s question. There is a separate workstream, therefore, to organize that private data for use. The LLM Ops engineer or data scientist provides access to the data sources that are needed to be supplied to the model in a RAG design. This means they create data source objects, using the Data Indexing and Retrieval service in PAIS, to refer to the various data repositories for the private data.


Agent Builder – Create a Component of a RAG Application
One of the most popular design pattems for AI applications is the Retrieval Augmented Generation (RAG) approach. In a RAG design, the user’s question, entered in a Chatbot UI, is itself first converted to embeddings form by the same embeddings model thatwe used earlier for storing the data. That converted question is sent as a query to the vector database to find similar entries or semantically meaningful entries in the vector database that match the query contents. The application developer makes use of the Private AI Agent Builder to combine the knowledge base with the completion model endpoint to design their (RAG) application. The embedding model is used to first convert the user’s question to embeddings form. Then that query is sent as a request to the vector database to find similar entries or semantically meaningful entries that match the query. An example interaction supported by the Agent Builder is shown below, with “Model endpoint” field showing the completions model and the knowledge base being used to access the private data from the vector database. This is test environment for the model and data indexing functionality, so that developers can now use this setup to create a customer service application.
Summary of the Private AI Services in VCF 9.0
In this article, we provide an outline of the new Private AI Services functionality in VCF 9.0. This set of tools is designed to make AI applications based on models and data sources much simpler for the user. The new Private AI Services (within VMware Private AI Foundation with NVIDIA) include -Model Store – for maintaining control over models and their versions – a step towards the goal of model governance; -Model Endpoints – based on a choice of model inference servers, such as vLLM, Infinity, etc. A model endpoint provides a URL and an API for accessing a model from a client program. -Data Indexing and Retrieval – a process and tools for organizing data (chunking/creating embeddings, indexing) that is based on using an embedding model. The embedding model is run on a model runtime with an endpoint of its own. Within this area, we define a Knowledge Base as an object that encapsulates a set of data stores that can be indexed for access in queries -Agent Builder – a tool that makes use of indexed Knowledge Bases and Model Endpoints to build an application Taken together, these services provide both the operations and data science staff who want to deploy private AI, on-premises, with a highly productive experience in organizing models and data for new AI applications. The set of services described here is integrated with VMware Cloud Foundation and VMware Private AI Foundation with NVIDIA to provide a full-featured platform for the next generation of AI applications. *** Ready to get hands-on with VMware Cloud Foundation 9.0? Dive into the newest features in a live environment with Hands-on Labs that cover platform fundamentals, automation workflows, operational best practices, and the latest vSphere functionality for VCF 9.0.Discover more from VMware Cloud Foundation (VCF) Blog
Subscribe to get the latest posts sent to your email.