What is the RAG service? - AnalyticDB - Alibaba Cloud Documentation Center

The Retrieval-Augmented Generation (RAG) service of AnalyticDB for PostgreSQL is an innovative AI service. The RAG service integrates retrieval and generation technologies to provide search experience in a more accurate, flexible, and cost-efficient manner. The RAG service uses data processing and semantic search to generate contextual and informative answers or content. The RAG service is widely used in a variety of fields, such as intelligent customer service, content creation, and knowledge management.

Overall architecture

The RAG service of AnalyticDB for PostgreSQL provides advanced hybrid search capabilities and consists of three modules: augmented data processing, augmented semantic search, and augmented retrieval.

The augmented data processing module provides in-depth pre-understanding of multimodal data to ensure high-quality data splitting and vector conversion.
The augmented semantic search module uses advanced semantic understanding technologies to perform in-depth analysis of user requirements and retrieve the most relevant information from the preprocessed data in a fast and accurate manner.
The augmented retrieval module uses fine-grained ranking and filtering algorithms to improve the relevance and diversity of search results and provide more informative and accurate answers for subsequent generation steps. The well-designed process contributes to the powerful content generation capabilities of the RAG service and ensures high quality and high matching of generated content.

The following figure shows the overall process.

Key benefits and capabilities

Key benefits

Quality of augmented generation: The RAG service uses search mechanisms in addition to language model generation technologies. In this case, the RAG service searches for relevant information from substantial databases before generating answers to ensure that the generated content is more accurate and closer to actual data and to significantly improve the quality and credibility of answers.
Large-scale knowledge fusion: The RAG service accesses and incorporates a variety of data sources, such as enterprise knowledge bases and public network resources, to ensure that the generated content covers a wider range of knowledge, meet information requirements for different scenarios, and implement the efficient reuse and personalized customization of knowledge.
Flexible APIs: The RAG service provides easy-to-use APIs that can be integrated into existing systems without the need to understand complex AI technologies. This allows the RAG service to easily handle scenarios such as the building of intelligent chatbots, automatic report generation, and content creation.
Continuous optimization and learning: The RAG service can continuously learn the latest data and customer feedback and automatically adjust optimization policies to adapt to changing requirements and environments based on the powerful computing capabilities and continuously optimized algorithm models of Alibaba Cloud. This ensures long-term service quality and user experience.

Key capabilities

Compatibility

The RAG service allows you to directly read and write vector data, upload split documents, and directly upload documents.
The RAG service supports SDKs for various programming languages, such as Python, Java, Go, Node.js, PHP, and C#.
The RAG service supports mainstream RAG frameworks, such as LlamaIndex and LangChain.

Document processing

The RAG service of AnalyticDB for PostgreSQL allows you to extract and split text from documents in multiple formats. The RAG service uses different extractors based on document formats to extract text and metadata such as page numbers and titles. The following document formats are supported:

You can use the Optical Character Recognition (OCR) extractor to extract text from images in the PNG, JPG, JPEG, and BMP formats.
You can use the OCR extractor to extract text from images or scanned PDF documents and add the information about images that have text relevance to the metadata.
You can use Python bindings such as PyMuPDF to extract text from PDF documents.
You can extract text from documents in other formats, such as HTML, MARKDOWN, JSON, CSV, DOCX, PPTX, and TXT.

The RAG service splits each chunk after text extraction. Take note of the following items:

To prevent inefficient embedding caused by large token sizes, you can split each chunk by using the ChunkSize and ChunkOverlap parameters.
You can specify separators for the text that you want to split.

Embedding

The RAG service supports various text embedding models, such as M3E, Text2Vec, and Tongyi, based on 512, 768, 1024, and 1536 dimensions.

The RAG service supports various multimodal image embedding models, such as CLIP and Tongyi, based on 512, 640, 768, 1024, and 1536 dimensions.

Search

The RAG service supports the following basic search capabilities: hybrid search, two-way retrieval, and fusion query. Hybrid search refers to simultaneous search for dense vectors and sparse vectors. Two-way retrieval refers to simultaneous retrieval of full-text search and vector search. Fusion query refers to vector search after condition-based filtering. The following multi-way search algorithms are supported:

Reciprocal rank fusion (RRF): The reciprocal ranking-based algorithm uses positions, but not scores, to perform ranking.
Weight: The weight-based algorithm uses scores, but not positions, to perform ranking.
Cascaded: The algorithm uses full-text search as a filter and performs top K search for vectors.

The RAG service supports the following augmented search capabilities:

Fine-grained ranking: The RAG service uses vector search to retrieve more than top K chunks and then uses BAAI General Embedding (BGE) models or large language models (LLMs) to obtain scores for ranking.
Window retrieval: To prevent a document from being split into context-missing segments, the RAG service can return several chunks before and after the matched chunk.

Security

Data privacy: The RAG service stores document data in the AnalyticDB for PostgreSQL instance that you create. You can destroy the document data or disable access to the document data based on your business requirements. You can configure disk encryption, SSL encryption, or IP address whitelists to protect data privacy.
Multi-tenancy isolation: The RAG service allows you to use namespaces, which are similar to database schemas, to isolate document collections. This facilitates data isolation within the same instance based on multiple organizational architectures.
Secure authentication: The RAG service uses Resource Access Management (RAM) authentication and instance username and password authentication to improve data access security.

Use scenarios

Intelligent customer service: Professional and personalized answers are generated based on the accurate understanding of customer questions and enterprise knowledge bases to improve customer satisfaction.
Content creation: A variety of contents, such as articles, news, and product descriptions, are generated based on preset themes or styles to improve content production efficiency.
Knowledge management: Easy-to-learn knowledge points are generated based on automatically organized and summarized documents to accelerate knowledge sharing and learning.
Education and training: Custom teaching materials such as exercises and case studies are generated based on student requirements and course content to provide an interactive and personalized teaching experience.
Patent search: A text processing optimizer is provided for patent documents to implement high-quality patent similarity search.