3D Rendering futuristic robot technology development, artificial intelligence AI, and machine learning concept. Global robotic bionic science research for future of human life.
Technical Private AI VMware Private AI

Introducing Summarize-and-Chat service for VMware Private AI

GenAI text summarization is becoming mainstream given its ability to quickly generate accurate and coherent summaries.  While publicly available tools for summarization exist, companies may opt for internal solutions for data privacy, security, and governance reasons.  So, there is a need for an on-premises solution that can adapt to the organization’s needs and data governance guidelines.  

Teams often face significant hurdles when building their own ML solutions.  What summarization technique should be used to summarize large documents that exceed the context window size of LLMs?  What are the best parsing libraries to parse large documents like PDFs which can be challenging due to their complex structures such as tables, charts and images?  What LLM is suitable for summarizing large meeting transcripts which have multiple swings in the dialog that make it harder to understand what contextual information is valuable for the summary?  What are the effective prompts for the selected models? 

Summarize-and-Chat service

Summarize-and-Chat, an open source project for VMware Private AI addresses the above challenges to help teams get started with their use case. This project can be deployed on VMware Private AI Foundation with NVIDIA for customers to get started with GenAI on their private data. This capability provides a versatile and scalable approach for common summarization tasks while facilitating natural language interactions through chat interfaces.  The integration of document summarization and chat interaction within a unified framework presents several advantages. Firstly, it enables users to obtain concise summaries of diverse content including articles, customer feedback, bugs/issues or a meeting that you missed. 

Secondly, by leveraging LLMs for chat interaction, this capability enables more engaging and contextually aware conversations, enhancing user experience and satisfaction.  

Key Features

Summarize-and-Chat provides the following features:

  • Support a range of document lengths and formats (PDF, DOCX, PPTX, TXT, VTT and popular audio file types (mp3, mp4, mpeg, mpga, m4a, wav, and webm) 
  • Support open source LLMs on OpenAI-compatible LLM inference engine
  • An intuitive user interface for file upload, summary generation, and chat.  
  • Summarization:
    • Insert, paste or upload your files & preview files. 
    • Pick the way you want to summarize (allow user to provide custom prompts, chunk size, page range for docs or time range for audio)
    • Adjust your summary length
    • Get your summary in seconds and download your summary
  • Chat with your document:
    • Auto-generated questions from the doc
    • Get the answer with the source in seconds
  • Insight Analysis
    • Select two or more docs
    • Write the prompt to compare or identify the insights from the selected docs
  • Speech-to-text convention 
  • Support various PDF parsers: PyPDF, PDFMiner, PyMUPDF
  • APIs  

Deployment Steps

Setup for Summarize-and-Chat is easy with a few configuration steps for each component:

Summarize-and-Chat includes three components:

  • Summarization-client: Angular/Clarity web application 
  • Summarization-server: FastAPI gateway server to manage core application functions including
    • Access control
    • Document ingestion pipeline: document ingestion, metadata extraction to vector (text embeddings) index population.
    • Summarization with LangChain Map Reduce. This approach enables us to summarize large documents that exceed the model’s input token limit.
    • Improved Retrieval Augmented Generation (RAG) by leveraging the power of LlamaIndex reranking and pgvector for enhanced performance in question-answering systems.
  • Speed-to-text (STT): Speed-to-text to convert audio to text using OpenAI’s faster-whisper

Please follow the quick install and configuration steps from the README and you’ll be up and running in a few minutes.

Using Summarize-and-Chat

Now, let’s dive in to see how you can use Summarize-and-Chat to summarize a long PDF document and chat with it end-to-end.

To get started, you first login to the summarization client using your Okta credentials.

  • Upload file and add metadata (date, version) 
  • Select the QUICK executive summaries option or DETAILED option to select your summarization preference.
  • Click the SUMMARIZE button and the summary is generated instantly. For a longer document, you will see the time estimation and receive a notification when the summary is ready for your download. 

Chat with your Document

You can click the CHAT icon on the top menu to start chatting with your document. You can pick one of the auto-generated questions or enter your own question and get the answer with the source in seconds.

What‘s Next

We’re excited to open source Summarize-and-Chat to support your data ML projects on VMware Private AI. 

If you want to get involved in the project, please see our contribution guide. Try Summarize-and-Chat today. We look forward to your feedback to help shape the long term roadmap for this project.