Developing a Generative AI (Gen AI) application using Retrieval-Augmented Generation (RAG) offers immense opportunities for customers. RAG enhances the generative capabilities of AI by combining it with a document retrieval system that pulls relevant information from external document sources. This ensures more contextually relevant, accurate, and up-to-date responses.
However, while RAG technology holds significant promise, it also introduces several complexities, especially around data sensitivity, security, and storage. In this blog, we’ll walk through how the components of a RAG-powered Gen AI application, while highlighting key concerns around data management and privacy compliance and how VMware Cloud Service Providers can help solve these challenges.
What is RAG and How Does It Work?
RAG (Retrieval-Augmented Generation) combines two powerful approaches:
- Generative AI: A large pre-trained model that generates new content, typically based on vast datasets, like OpenAI’s GPT or Meta’s LLaMA.
- Document Retrieval System: A system that stores and retrieves external data to supplement the generative model. Instead of relying solely on the model’s internal knowledge, RAG allows it to pull information from real-time databases, documents, or web sources, these can often be internal and hence need further considerations.
The goal of RAG is to enhance the accuracy and relevance of AI-generated responses by feeding the generative model with external data as needed. This is highly beneficial for industries like customer service, healthcare, and finance, where up-to-date and precise information is crucial.
Components of a RAG-Powered Gen AI Application
The following components are often found in a RAG application:
- Foundation Model: A pre-trained generative AI model (e.g., GPT, PaLM) to generate the responses. NVIDIA provide a range of models through the NVIDIA Global Catalog (NGC) ready, tuned to NVIDIA GPU for best performance and production support by NVIDIA (vs open source), VMware Private AI automates (via simple catalog choices) the deployment and lifecycle management of your model into a deep learning VM.
- Document Store: A database or knowledge repository to store and retrieve relevant documents, which augment the generative AI’s capabilities. VMware Private AI provides a Vector database (Postgres+pgvector) and automates the deployment and life cycle management as well as the connection to the NVIDIA Nemo Retriever for the model and database.
- Embedding Model: Used to encode documents and user queries into vectors for similarity search.
- Retrieval Mechanism: A system (e.g., ElasticSearch, Pinecone) to retrieve the most relevant data based on the user’s input query.
- Integration Pipeline: A system that connects the query, retrieval, and generation steps, seamlessly providing the final response to the user.
Enhancing RAG-Powered Gen AI Applications with VMware Cloud and VMware Private AI Foundation with NVIDIA
While the technical components of RAG are straightforward, there are serious implications for data security and storage, especially given the increasing scrutiny around data privacy regulations like General Data Protection Regulation (GDPR), Financial Data Access (FIDA), California Consumer Privacy Act (CCPA), and industry-specific rules (e.g., Health Insurance Portability and Accountability Act (HIPAA) for healthcare). There are also plenty of regional regulations that customers in specific verticals must adhere to, combined with wider regulations for national entities, the regulatory landscape is very complex for all customers.
Sensitive Data Management
One of the most critical challenges in developing a RAG application is ensuring the system does not inadvertently expose sensitive data. RAG systems often work with large document stores containing a variety of data types. If any of this data is sensitive—such as customer information, personal identifiers, or proprietary company information (legal should be pulled into this conversation)—it must be carefully handled to prevent breaches.
Whilst models can be trained using synthetic data, real data is always a preference to avoid inaccuracies that may be embedded in their training data. Real data, sourced from human interactions like online comments, social media posts, and chat messages, is considered the highest quality for AI training, particularly for language models (LLMs) because it reflects authentic human behavior. However, despite its value, real data poses significant challenges. Gathering and preparing real data is both costly and time-consuming. It requires extensive cleaning to comply with privacy regulations (e.g., GDPR, HIPAA), remove errors, and standardize the data. Data labeling also demands trained human annotators. Due to the complexity of this process, data scientists spend up to 80% of their time on data preparation, which many find to be the most frustrating part of their work.
Concerns to Address:
- Data Classification: Ensure proper classification of sensitive data within your document repository. Embedding models may inadvertently access and retrieve confidential data if they aren’t carefully tagged or segregated. All data should be classified (including metadata and accounting) and strict rules in place to prevent data unauthorized access, this is a particular consideration when using public hyperscale cloud solutions and may involve substantial complexity that can only be negated with non-hyperscale solutions.
- Pseudonymization and Anonymization: Sensitive data should be either pseudonymized (where identifiers are replaced with artificial ones) or anonymized (where identifiers are completely removed) to reduce risk. This will render the data useless to anyone trying to read the data and involves substantial complexity and add on services / software costs and architectures.
How a VMware Cloud Service Provider with VMware Private AI Foundation with NVIDIA supports sensitive data for AI?
VMware’s approach ensures data remains on-premises or within a private cloud under the customer’s control in a VMware Cloud Service Provider, offering the flexibility of AI model training and inference while maintaining stringent security measures, implied by the existing VMware Cloud Service Provider’s regulatory controls and security capabilities. With the availability of services and functions that segment data and apply fine-grained access control over all data (often a daunting task with every growing data) customers can ensure that sensitive information is only accessed by authorized personnel, reducing exposure risks in comparison to public hyperscale clouds.
Document Store Security
Your document store will be the heart of your RAG system, and storing large amounts of data requires robust security measures to protect against unauthorized access. Your data is your responsibility, and leaks or unauthorized access can result in large lawsuits and bad press or reputational damage. This is why VMware believe Private AI is best suited for organizations seeking AI solutions, whereby the data stays with you on your premise, under your control.
Concerns to Address:
- Encryption at Rest and in Transit: Ensure all documents and vectors (numerical representation of data, typically an array of numbers, used to represent text, images, or other data for processing by ML models) in your data store are encrypted both at rest and in transit. This is especially important when storing highly sensitive information, such as financial documents or health records.
- Access Control: Set up strong access controls to restrict who can access the document store. Implement role-based access control (RBAC) to limit access based on API users, job roles and requirements.
- Audit Logging: Maintain detailed logs of who / what accessed data and when. In the case of a breach or accidental exposure, these logs will help in tracing back the root cause. Log management in this context can be extremely complex and as a best practice, developers of AI applications should build in management ‘hooks’ to call out events when specific data types are accessed.
How a VMware Cloud Service Provider with VMware Private AI Foundation with NVIDIA supports document security?
VMware Private AI Foundation with NVIDIA solutions are uniquely suited for organizations needing to maintain control over their document store security within the Cloud Provider’s data center, with Zero Trust security and Role Based Access controls addons in VMware Cloud Foundation, VSAN Express Storage Architecture (ESA) and availability of encryption technologies, access controls and extensive audit logging for regulatory and operational support. VSAN provides encryption at rest and in motion, data protection (in or out of region or DC) and multiple key rotation support. With Private AI, the organization can limit access to critical data to internal systems only, both physically and virtually, ensuring greater control over sensitive data than would be possible with a shared public cloud infrastructure.
Compliance with Data Privacy Regulations
With stringent data privacy regulations like GDPR (Europe) and CCPA (California), compliance is non-negotiable for any application handling user data. Non-compliance could result in lawsuits, fines and damage to your company’s reputation.
Concerns to Address:
- Right to be Forgotten: GDPR grants users the right to have their data erased upon request. If your system retrieves personal data, you need to ensure that it can also remove that data from all storage systems and backups, when necessary.
- Data Residency: Depending on the jurisdiction, certain types of data may be required to remain within a particular region or country. When setting up cloud infrastructure, ensure that your data storage complies with these residency requirements.
- Data Minimization: Store only the data necessary for the operation of the model. By reducing the amount of data stored, you reduce the risk of data breaches.
How a VMware Cloud Service Provider with VMware Private AI Foundation with NVIDIA supports compliance?
Sovereign VMware Cloud Service Providers allows organizations to deploy AI applications that comply with complex regional and industry-specific data residency rules. VMware Cloud Service Providers implement necessary security requirements, multi factor authentication, certifications, audits and infrastructure reliability within the region. If they are a Sovereign provider this is under regional jurisdictional control and only accessed and supported by national approved employees. No accounting or metadata is handled by or accessed by foreign jurisdictions, unlike hyperscale solutions. Together these ensure that sensitive data can be stored and processed within the required jurisdictions without risking non-compliance, positioning VMware Private AI Foundation with NVIDIA as an ideal solution for regulated industries such as healthcare, finance, and government.
Training and Fine-Tuning AI Models
Training is crucial in making sure your RAG system not only generates accurate and relevant responses but also aligns with the domain and user needs, all while safeguarding sensitive data. Training comes in 2 basic forms, either pre trained out of the box with the option to fine tune, which is the fastest time to market for a solution or custom.
There are many pretrained models available and most likely you will be able to use an existing model with some fine tuning using a smaller, domain-relevant datasets for specific use cases or industries. For instance, if your application needs to generate responses in a niche domain (e.g., healthcare or finance). Fine-tuning on domain-specific data ensures more accurate outputs by adjusting the weights of the model to improve performance in a specific context without losing general language generation capabilities.
Depending on the complexity of the application, you might want to train a custom embedding model. This is especially useful if your domain contains industry specific jargon i.e. science-based industries like healthcare, technical language, or unique patterns of text that general models might not handle well. Custom training helps improve the retrieval system by generating more precise vectors, but will incur additional costs that maybe prohibitive.
To keep things simple, I will avoid going the last training type, continuous, where user feedback can be used to refine both the generative model and retrieval system, as this type of training mandates the model and retrieval system are realistically secure and private in the first place.
Concerns to address:
- Data Sensitivity: When training models, particularly using real-world data, you need to ensure that sensitive data (e.g., personal information or proprietary knowledge) is either anonymized or excluded to avoid potential privacy risks.
- Compliance: Ensure that any data used for training, especially fine-tuning or continuous learning, complies with data privacy regulations like GDPR or CCPA. It is important to note that whilst using synthetic data can create a model, the accuracy of a model is dictated by the information it learns from and the best information will be real data in 99% of cases.
How a VMware Cloud Service Provider with VMware Private AI Foundation with NVIDIA supports training models?
VMware’s environment is perfect for fine-tuning models on domain-specific data while safeguarding sensitive information. Organizations can train models internally without exposing data to public environments, enabling them to harness the benefits of AI without compromising security. Customers can retain full control over their data, ensuring that sensitive information remains within their Cloud infrastructure. This control helps organizations comply with strict privacy regulations like GDPR, HIPAA, and others. By keeping AI workloads on-premises or within secure cloud environments, it minimizes the risk of data exposure, particularly when dealing with documents containing personally identifiable information (PII).
Embedding and Vector Storage Security
In a RAG system, text documents are encoded into vector embeddings, which are dense numeric representations of the data. However, these embeddings can still carry information from the original text, which means sensitive data could potentially be reconstructed.
Concerns to Address:
- Embedding Sanitization: Apply sanitization techniques to ensure that sensitive or identifiable information does not carry over into vector embeddings.
- Secure Storage of Embeddings: Like the original documents, embeddings should be encrypted and stored securely. Ensure that access to embeddings is restricted and monitored.
How a VMware Cloud Service Provider with VMware Private AI Foundation with NVIDIA supports storage security?
VMware offers a controlled environment with VMware Cloud Foundation and Private AI deep learning VMs, Vector (Postgres pgvector) RAG databases, with operational services already understood and acknowledged as enterprise and national grade, whereby organizations can securely manage embeddings and vectors, ensuring that even the encoded representations of sensitive data are protected.
Vector Storage data is typically encrypted at rest, ensuring that sensitive information processed by AI/ML models, is protected. VMware Private AI supports industry-standard encryption protocols (such as AES-256) to safeguard stored vectors and relational data. Also any data in transit between the client and the database is encrypted using SSL/TLS, protecting vectors and other data from being intercepted during transmission.
With real time continuous monitoring and auditing services from Cloud Providers, scanning and monitoring of vector data takes place as it is ingested, processed, and used. This ensures that new security vulnerabilities or threats can be detected and addressed without delay. Equally logging services track any changes to embeddings or their metadata to help maintain a secure system. This includes tracking when embeddings are added, modified, or removed, and who performed the action.
Inference Privacy
Once the application is live, user interactions with the system (queries and responses) may (and in many cases probably will) contain sensitive data. These queries should not be stored or used for future model improvements without the user’s explicit consent.
Concerns to Address:
- Data Logging: Avoid logging full queries or responses unless necessary as this will create a lot of extra data and you will need to ensure that sensitive data is redacted or anonymized.
- User Consent: Implement mechanisms to inform users about how their data will be used and obtain consent before logging any data for improvements.
- Differential Privacy: Consider implementing differential privacy techniques in the AI model training process to add noise to user data, ensuring individual privacy while still improving the model.
Best Practices for Securing Your RAG Application
- Data Encryption: Use strong encryption methods for data at rest and in transit to protect sensitive information from unauthorized access.
- Access Control: Ensure proper authentication and authorization mechanisms for accessing both the generative model and the document store.
- Anonymization and Redaction: Before feeding any document into your system, anonymize or redact any sensitive data to prevent leakage.
- Periodic Audits: Regularly audit your document store, retrieval system, and application code to ensure compliance with security and privacy standards.
- Scalable Monitoring: Implement real-time monitoring to detect and respond to any security breaches or data leakage incidents quickly.
Conclusion: Why VMware Cloud Service Providers with VMware Private AI Foundation with NVIDIA?
VMware Cloud Service Providers, leveraging Private AI and existing proven reference architectures for NVIDIA & VMware Cloud Foundation, present a powerful solution for organizations aiming to develop AI applications while maintaining control over sensitive data. Whether it’s ensuring data residency compliance, securing document stores, or protecting data during AI model training, VMware’s platform allows organizations to balance innovation with the highest levels of sovereign security and privacy. The inherent flexibility of VMware Cloud Foundation, combined with the localized control provided by NVIDIA and VMware Private AI Foundation with NVIDIA, makes it the go-to choice for companies in highly regulated sectors looking to build AI-powered solutions confidently.
Existing VMware Cloud Service Providers who are NVIDIA Partner Network CSPs include; IONOS, UniServer, ProAct, OVHCloud, NxtGen, ComputaCenter, Yotta, SoftBank, NEC and Hitachi, see more here.
Integrating VMware Private AI Foundation with NVIDIA into the development of RAG-powered applications ensures that data-sensitive industries can harness the transformative power of AI while maintaining full compliance with regulatory standards, making VMware Cloud Service Providers a superior solution for building secure, next-generation AI applications.