Introduction to RAG

发表于 2025-01-01 更新于 2025-05-19 分类于 AI 阅读次数：本文字数： 3.1k 阅读时长 ≈ 6 分钟

What is RAG?

RAG stands for Retrieve-Augmented Generation, a technique that enhances the capabilities of language models by incorporating external information. Compared to pure Language Models (LLMs), RAG offers several advantages:

More Precise Answers: By retrieving relevant information from a knowledge base, RAG can provide more accurate and contextually appropriate responses.
Reduced Hallucinations: RAG reduces the likelihood of generating false or irrelevant information by grounding its outputs in real-world data.
Access to Latest Information: RAG can utilize up-to-date external sources, ensuring that the generated content is current.

RAG Structure

The typical structure of a RAG system includes:

Vector Database: This stores the knowledge base as vectors for efficient retrieval.
Embedding Model: Converts text from the knowledge base into vector representations.
Language Model (LLM): Generates responses based on retrieved context and its own understanding.

RAG Process

The process of RAG involves two main steps:

Indexing: The knowledge base is indexed using an embedding model, converting textual information into vectors for storage in a vector database.
Query and Response Generation: When a query is received, the system retrieves relevant context from the vector database and uses it to generate a response with the help of the LLM.

Open-Source Solutions

Several open-source frameworks are available for implementing RAG:

LLamaIndex: Connects to various data sources and converts data into vectors using an embedding model.
HayStack
LangChain
GraphRAG: A logic reasoning framework that supports multi-hop fact questions.
KAG: An OpenSPG engine-based framework for building domain-specific knowledge bases.
AnythingLLM: Provides a chat interface to convert documents into context for LLMs.
MaxKB: An open-source knowledge base问答系统 used in customer service, internal knowledge bases, and education.
RAGFlow: A deep document understanding-based RAG engine that provides reliable Q&A services.
FastGPT: A knowledge base问答 system with data processing and model calling capabilities.
Langchain-Chatchat: Local knowledge base Q&A based on Langchain and ChatGLM.
FlashRAG: A Python toolset for reproducing and developing RAG research, including 36 preprocessed benchmark datasets and 15 advanced algorithms.
Open WebUI（前身为Ollama WebUI）是一个可扩展的、功能丰富的、用户友好的自托管Web界面，设计用于完全离线运行。它支持各种LLM（大型语言模型）运行器，包括Ollama和兼容OpenAI的API。

Vector Database

Qdrant: A fast, Rust-based vector database focusing on performance and efficiency.
Chroma: Popular for its simplicity and ease of use.

Embedding Model

Sentence Transformers: A library for generating high-quality sentence embeddings.
BERT (Bidirectional Encoder Representations from Transformers): BERT is a transformer-based model known for its ability to understand context by pre-training on a large corpus of text.

Language Models (LLMs)

llama3.3
Gemma2

Cloud Solutions

Cloud-based services that support RAG include:

Google Vertex AI Matching Engine: Provides vector search and a managed RAG solution for enterprise search.
AWS Kendra + Sagemaker/Bedrock: Combines with Kendra for enterprise search and Bedrock for LLMs to build RAG solutions.
Azure AI Search + Azure OpenAI Service: Offers vector search and integrates well with Azure OpenAI Service for building RAG applications.
火山方舟大模型服务平台: A platform for large language models.
腾讯云ES: Based on the ElasticSearch ecosystem.
阿里云PAI: Guide to deploying a RAG-based dialogue system.

Reference

火山方舟大模型服务平台
 Learn how to deploy a RAG-based dialogue system on Aliyun PAI
详解几种常见本地大模型个人知识库工具部署、微调及对比选型
 RAG框架总结主流框架推荐
 搭建个人 RAG 推理服务