AI concept. 3D render
Home Page VMware Private AI Services

Deploy VMware Private AI Services in Minimal VMware Cloud Foundation Environments

VMware Cloud Foundation (VCF) offers a comprehensive suite of software-defined services, enabling enterprises to build reliable and efficient cloud infrastructure with consistent operations across diverse environments. The latest addition to this platform is VCF Private AI Services, a secure set of services for deploying AI applications using models and data sources.

VCF Private AI Services integrates with VCF Automation to provide a simplified, cloud-like experience that allows users to deploy models into production easily, often within minutes. For instance, the workflow for deploying model endpoints is available in the VCF Automation UI, as shown below.

Without VCF Automation, deploying model endpoints to a namespace would require using VCF Consumption Command Line Interface (VCF Consumption CLI) and kubectl.

Although VCF Automation greatly simplifies the user experience for VCF Private AI Services, the platform remains accessible even in environments that have not yet fully adopted VCF Automation. This blog will guide you through how to deploy VCF Private AI Services within a VCF environment where VCF Automation has not yet been configured. 

Deployment Workflow Overview

This outlines the deployment process for VCF Private AI Services in a VCF environment where VCF Automation is not yet in use:

  1. Install Private AI Services on the Supervisor.
  2. Create a namespace through the vSphere Client.
  3. Prepare NVIDIA configmap and secret.
  4. Prepare trust bundles for Private AI Services.
  5. Prepare Private AI Services configuration YAML file. 
  6. Create a context for the namespace using VCF Consumption CLI.
  7. Activate Private AI Services on the namespace. 

Once VCF Private AI Services is activated on a namespace, you can push models to the Model Store and deploy Model Endpoints via kubectl. 

Prerequisites

To follow along with this guide, you will need:

For a more detailed list of prerequisites for deploying VCF Private AI Services, please refer to the official documentation

Deploy VCF Private AI Services

1. Install Private AI Services on the Supervisor

We first need to download the YAML definition file for Private AI Services from the Broadcom Support Portal. Find VMware Private AI Services and click on the release version you want. 

To get your OCI registry credentials, click the green badge icon next to the release version. Follow the instructions given to add the registry to the Supervisor via vSphere Client. 

Once the registry is added, you can follow the official documentation to install Private AI Services by leveraging the Private AI Foundation workflow available in the vSphere Client. 

Alternatively, you can install Private AI Services directly using Supervisor Management with the following steps:

  1. Navigate to Supervisor Management in the vSphere Client. Under Services, click Add under Add New Service.
  1. Upload the Private AI Services YAML file to register the service then click Finish.
  1. Once the new Private AI Services card appears, click Actions then Manage Service.
  1. Select the Supervisor then click Next. Leave the YAML Service Config blank and click Finish.
  1. To check that the installation has been successful, navigate to the Supervisor. Under Configure, select Overview under Supervisor Services. Private AI Services should display “Configured” under the Status.

2. Create a namespace through the vSphere Client

VCF Private AI Services are activated at the namespace level. So, let’s create a vSphere namespace by navigating to Supervisor Management > Namespaces and selecting New Namespace. Proceed through the setup wizard. For more details, you can refer to the official documentation.

After creating a namespace, you need to add the storage policies and VM classes that will be accessible to the resources within it. You can do this by going to the namespace and using the Storage and VM Service tiles under the Summary tab. 

For VM Service, you should include CPU-only VM classes for control plane and worker nodes, and VM classes with GPUs for model endpoints.

3. Prepare NVIDIA configmap and secret

In order to use NVIDIA vGPUs for the Private AI resources, you need to create a ConfigMap for the NVIDIA license and a Secret for the NVIDIA GPU Cloud (NGC) API token for authentication.

The NVIDIA license ConfigMap needs the client_configuration_token.tok in the data.

The Secret needs the API token in the data.dockerconfigjson.

Please note that while I have used licensing-config for the ConfigMap and ngc-secret for the Secret, you are free to choose different names. Just be sure to record these names, as they must be referenced in the Private AI Services configuration YAML file.

You can also find clean YAML templates for the NVIDIA ConfigMap and Secret in the official documentation.

4. Prepare trust bundles for Private AI Services

You need trust bundles, provisioned as ConfigMaps, for VCF Private AI Services to establish secure HTTPS connections with various components, such as the OIDC provider, the Harbor registry, and the PostgreSQL database. The specific trust bundles necessary depend on your environment and the components integrated with Private AI Services. Likely, you will need separate trust bundles for the OIDC provider and the Harbor registry. 

I use VMware Data Services Manager in my environment to provision a PostgreSQL database for Private AI Services. Hence, below is an example of a trust bundle for VMware Data Services Manager. 

You can refer to the official documentation for additional examples of trust bundles. 

5. Prepare Private AI Services configuration YAML file

Finally, we prepare the Private AI Services configuration YAML file. You can refer to the official documentation for the template configuration YAML file that is in compliance with the PAISConfiguration Custom Resource Definition (CRD). Make sure to pay attention to the NVIDIA vGPU host driver version in your environment – you may need to override the gpu-operator version with a configmap, as detailed in the instructions in the template configuration YAML file.

Below is the Private AI Services configuration file for my deployment as an example.

6. Create a context for the namespace using VCF Consumption CLI

We now have all the files required to activate Private AI Services, but we first need to gain access to the namespace. You can use any machine that has the VCF Consumption CLI installed; the VCF Consumption CLI can be downloaded from the supervisor cluster (https://<supervisor-cluster-ip>). 

  1. Create a kubernetes context by using basic vSphere authentication with access to the Supervisor. Enter the password when prompted. 

    vcf context create <supervisor_context_name> --endpoint <supervisor_ip_address> --auth-type 'basic' --username 'administrator@vsphere.local' 

Optionally, you can use the  --insecure-skip-tls-verify flag to bypass the certificate check, but this is not recommended in production. 

  1. All the namespaces configured on the Supervisor will be listed. Switch to the context where you want to activate PAIS. 

    vcf context use <namespace_context_name>

7. Activate Private AI Services on the namespace

Once you’ve switched to the context where you want to activate PAIS, navigate to the directory where you have all the files prepared from the previous steps. Apply all the files by using kubectl apply -f <file_name>. Make sure to apply the Private AI Services configuration YAML file last because it is dependent on the Secrets and ConfigMaps. 

You can verify the PAIS configuration deployment was successful with the following commands:

kubectl describe paisconfiguration <pais_configuration_name>

kubectl get paisconfiguration
NAME      SERVICE                READY   REASON
default   pais-ingress-default   True    paisAvailable

You can also refer to the official documentation for more details. 

Use VCF Private AI Services

Use Model Store

You can download models from sites such as NVIDIA NGC or HuggingFace locally then use VCF Consumption CLI to push models to the Model Store. You can use any OCI compliant registry as your Model Store, such as Artifactory and Harbor. 

Below is an example where Harbor is being used as the Model Store. The sequence of commands demonstrate how to download a model (llama-3.2-1b-instruct) from HuggingFace and uploading it to Harbor:

You can refer to the official documentation for detailed instructions.

Deploy Model Endpoints

The UI to deploy model endpoints is available through VCF Automation. Without VCF Automation yet configured, you would deploy the model endpoints using the Kubernetes Resource YAML. 

Below is a sample YAML file for a model endpoint that deploys a Mistral completion model.

Once you have a model endpoint YAML file prepared, you deploy the model endpoint by applying this file in the namespace. 

You can verify the model endpoint deployments with the following command:

kubectl get modelendpoints

To view a specific model endpoint, you can use the following command:

kubectl get modelendpoints/<model-endpoint-name>

Deliver RAG Applications by Using VCF Private AI Services

VCF Private AI Services has a standalone UI where you can create knowledge bases with linked data sources to automatically collect and index data updates over time. You can also build an agent in Agent Builder in this standalone UI to facilitate prompt tuning and testing. The agent utilizes a completions model running in Model Runtime and for RAG applications, integrates it with a knowledge base from Data Indexing and Retrieval.

To access the UI, you locate the external IP address assigned to the Private AI Services instance by the ingress service (pais-ingress-default) by looking at the services via kubectl get services. Navigate to the external IP address of the pais-ingress-default service via https in a web browser. 

Please refer to the official documentation for more details on how to use these Private AI Services through its standalone UI. You can also use the Open AI Compatibility APIs to interact with deployed models using OpenAI-compatible clients.


Discover more from VMware Cloud Foundation (VCF) Blog

Subscribe to get the latest posts sent to your email.