跳转至

高级RAG技术

⚠️ 时效性说明:本章涉及前沿模型/价格/榜单等信息,可能随版本快速变化;请以论文原文、官方发布页和 API 文档为准。

高级RAG技术图

📌 定位说明:本章侧重高级RAG实战(GraphRAG、Agentic RAG等)。 - 📖 RAG的NLP理论基础(检索原理、Embedding技术、评估方法)请参考 自然语言处理/14-RAG系统设计 - 📖 RAG研究前沿与长文本技术请参考 LLM学习/04-前沿探索/03-RAG与长文本

📌 RAG从简单的"检索+生成"演进为融合知识图谱、Agent自适应检索、多步推理的复杂系统。掌握GraphRAG、Agentic RAG、高级检索策略与评估方法,是构建生产级RAG系统的核心竞争力。

前置知识: RAG系统构建(第5章)、向量数据库(第6章)、LangChain框架(第8章)

🎯 学习目标

  • 理解RAG技术从Naive到Agentic的演进脉络
  • 掌握GraphRAG的知识图谱构建与检索机制
  • 理解Agentic RAG的自适应检索策略(Self-RAG、CRAG)
  • 熟练使用高级检索策略(混合检索、查询转换、上下文压缩)
  • 掌握RAG评估框架(RAGAS、TruLens)的使用方法
  • 能够设计和实现生产级高级RAG管线
  • 掌握RAG相关面试高频考点

18.1 RAG技术演进

18.1.1 四代RAG架构

Text Only
┌─────────────┐    ┌──────────────┐    ┌─────────────┐    ┌──────────────┐
│  Naive RAG  │───>│ Advanced RAG │───>│ Modular RAG │───>│ Agentic RAG  │
│  (基础RAG)  │    │ (高级RAG)     │    │ (模块化RAG)  │    │ (智能体RAG)   │
└─────────────┘    └──────────────┘    └─────────────┘    └──────────────┘
    2023初              2023中              2024初             2024-2025
阶段 核心特征 代表技术 局限性
Naive RAG 简单的检索-生成管线 基础向量检索 + LLM生成 检索质量差、缺乏推理
Advanced RAG 预检索/后检索优化 查询重写、重排序、HyDE 静态管线、缺乏自适应
Modular RAG 模块化、可组合 插件式模块、路由、融合 需人工设计管线
Agentic RAG Agent驱动的自适应检索 Self-RAG、CRAG、多步推理 延迟增加、成本上升

18.1.2 Naive RAG的核心问题

Python
"""Naive RAG:基础检索-生成管线"""
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# 1. 文档加载与分块
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50
)
# docs = text_splitter.split_documents(raw_docs)

# 2. 向量化与存储
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# vectorstore = Chroma.from_documents(docs, embeddings)
# retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

# 3. 生成
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("""
基于以下上下文回答问题。如果上下文不包含答案,请说明。

上下文:{context}
问题:{question}
答案:""")

# chain = (
#     {"context": retriever | format_docs, "question": RunnablePassthrough()}
#     | prompt
#     | llm
#     | StrOutputParser()
# )

Naive RAG的核心问题: 1. 检索粒度粗:固定chunk size丢失上下文 2. 语义鸿沟:查询和文档embedding不总是对齐 3. 无关信息干扰:检索到的内容可能包含噪声 4. 缺乏推理:无法处理需要多步推理的问题 5. 无自适应:不论问题复杂度都用相同管线


18.2 GraphRAG(微软)

18.2.1 GraphRAG核心思想

GraphRAG由微软研究院提出,将知识图谱与RAG结合,特别擅长处理需要全局理解的问题。

核心流程

Text Only
原始文档 → 实体提取 → 关系抽取 → 知识图谱构建 → 社区检测 → 社区摘要 → 查询
                                                    (Leiden算法)

与传统RAG的对比

特性 传统向量RAG GraphRAG
数据结构 向量数据库 知识图谱 + 社区摘要
检索方式 相似度检索 图遍历 + 社区索引
全局问题 效果差 效果好
局部问题 效果好 效果好
构建成本 高(大量LLM调用)
更新成本 中高

18.2.2 知识图谱构建

Python
"""GraphRAG知识图谱构建(使用graphrag库)"""
# pip install graphrag

# 方式1: 使用微软graphrag CLI
# graphrag init --root ./ragtest
# graphrag index --root ./ragtest

# 方式2: 使用LangChain + LLM构建知识图谱
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

class Entity(BaseModel):  # Pydantic BaseModel:自动数据验证和序列化
    name: str = Field(description="实体名称")
    type: str = Field(description="实体类型,如:人物/组织/技术/概念")
    description: str = Field(description="实体描述")

class Relationship(BaseModel):
    source: str = Field(description="源实体")
    target: str = Field(description="目标实体")
    relation: str = Field(description="关系类型")
    description: str = Field(description="关系描述")

class KnowledgeGraph(BaseModel):
    entities: list[Entity] = Field(description="实体列表")
    relationships: list[Relationship] = Field(description="关系列表")

llm = ChatOpenAI(model="gpt-4o", temperature=0)

kg_prompt = ChatPromptTemplate.from_messages([
    ("system", """你是一个知识图谱构建专家。从给定文本中提取实体和关系。
要求:
1. 提取所有重要实体(人物、组织、技术、概念等)
2. 提取实体间的关系
3. 每个实体和关系都需要有描述"""),
    ("human", "请从以下文本中提取知识图谱:\n\n{text}")
])

kg_chain = kg_prompt | llm.with_structured_output(KnowledgeGraph)

# 示例文本
text = """
OpenAI于2024年发布了GPT-4o模型,支持多模态输入输出。
Sam Altman是OpenAI的CEO,他推动了ChatGPT的商业化。
Google DeepMind发布了Gemini模型与GPT-4o竞争。
LangChain是一个流行的LLM应用开发框架,支持RAG和Agent开发。
"""

kg = kg_chain.invoke({"text": text})
print("实体数量:", len(kg.entities))
for entity in kg.entities:
    print(f"  [{entity.type}] {entity.name}: {entity.description}")

print("\n关系数量:", len(kg.relationships))
for rel in kg.relationships:
    print(f"  {rel.source} --[{rel.relation}]--> {rel.target}")

18.2.3 社区检测与摘要(Leiden算法)

Python
"""社区检测与社区摘要"""
import networkx as nx
# pip install graspologic
from graspologic.partition import hierarchical_leiden

def build_graph_and_detect_communities(kg: KnowledgeGraph):
    """构建NetworkX图并进行社区检测"""
    G = nx.Graph()

    # 添加节点
    for entity in kg.entities:
        G.add_node(entity.name, type=entity.type, description=entity.description)

    # 添加边
    for rel in kg.relationships:
        G.add_edge(rel.source, rel.target, relation=rel.relation, description=rel.description)

    # Leiden算法社区检测
    # hierarchical_leiden 返回 HierarchicalCluster 列表,每项有 node, cluster, level 等属性
    community_mapping = hierarchical_leiden(G, max_cluster_size=10)

    # 社区分组
    communities = {}
    for item in community_mapping:
        community_id = item.cluster
        node_id = item.node
        if community_id not in communities:
            communities[community_id] = []
        communities[community_id].append(node_id)

    return G, communities

def generate_community_summaries(communities: dict, G: nx.Graph, llm) -> dict:
    """为每个社区生成摘要"""
    summaries = {}
    for community_id, nodes in communities.items():
        # 收集社区内的实体和关系信息
        community_info = []
        for node in nodes:
            node_data = G.nodes[node]
            community_info.append(f"实体: {node} ({node_data.get('type', 'unknown')})")
            for neighbor in G.neighbors(node):
                if neighbor in nodes:
                    edge_data = G.edges[node, neighbor]
                    community_info.append(f"  关系: {node} --{edge_data.get('relation', '')}--> {neighbor}")

        # LLM生成社区摘要
        info_text = "\n".join(community_info)
        summary = llm.invoke(f"请为以下知识图谱社区生成一段摘要:\n{info_text}")
        summaries[community_id] = summary.content

    return summaries
Python
"""GraphRAG查询模式"""

class GraphRAGQueryEngine:
    """GraphRAG查询引擎"""

    def __init__(self, community_summaries, graph, vectorstore, llm):
        self.community_summaries = community_summaries
        self.graph = graph
        self.vectorstore = vectorstore
        self.llm = llm

    def global_search(self, query: str) -> str:
        """
        Global Search:适合全局性/总结性问题
        策略:使用社区摘要进行Map-Reduce
        例如:"数据集中有哪些主要主题?"
        """
        # Map阶段:对每个社区摘要生成部分答案
        partial_answers = []
        for community_id, summary in self.community_summaries.items():
            response = self.llm.invoke(
                f"基于以下社区信息回答问题(如无关则回复N/A):\n"
                f"社区信息:{summary}\n问题:{query}"
            )
            if "N/A" not in response.content:
                partial_answers.append(response.content)

        # Reduce阶段:合并部分答案
        combined = "\n\n".join(partial_answers)
        final = self.llm.invoke(
            f"请综合以下信息回答问题:\n\n{combined}\n\n问题:{query}"
        )
        return final.content

    def local_search(self, query: str, top_k: int = 5) -> str:
        """
        Local Search:适合具体/局部问题
        策略:结合向量检索 + 图谱邻居遍历
        例如:"OpenAI的CEO是谁?"
        """
        # 向量检索相关实体
        relevant_docs = self.vectorstore.similarity_search(query, k=top_k)

        # 图扩展:获取相关实体的邻居信息
        context_parts = []
        for doc in relevant_docs:
            entity_name = doc.metadata.get("entity_name", "")
            if entity_name in self.graph:
                neighbors = list(self.graph.neighbors(entity_name))
                for neighbor in neighbors:
                    edge_data = self.graph.edges[entity_name, neighbor]
                    context_parts.append(
                        f"{entity_name} --[{edge_data.get('relation', '')}]--> {neighbor}"
                    )

        context = "\n".join(context_parts)
        response = self.llm.invoke(
            f"基于以下知识图谱信息回答问题:\n{context}\n\n问题:{query}"
        )
        return response.content

18.3 Agentic RAG

18.3.1 Agent驱动的自适应检索

Agentic RAG让Agent根据查询特点动态决定检索策略,而非使用固定管线。

Text Only
用户查询 → Agent判断 ─→ 直接用LLM回答(简单问题)
                      ├→ 向量检索(事实性问题)
                      ├→ 图谱检索(关系性问题)
                      ├→ SQL查询(结构化数据)
                      └→ 多步检索(复杂问题)

18.3.2 查询路由(Router)

Python
"""查询路由器:根据问题类型选择检索策略"""
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import Literal

class RouteDecision(BaseModel):
    """路由决策"""
    datasource: Literal["vectorstore", "knowledge_graph", "web_search", "direct_llm"] = Field(  # Literal限定取值范围
        description="选择数据源"
    )
    reasoning: str = Field(description="选择理由")

llm = ChatOpenAI(model="gpt-4o", temperature=0)

router_prompt = ChatPromptTemplate.from_messages([
    ("system", """你是一个查询路由专家。根据用户问题选择最佳数据源:

- vectorstore: 需要从企业知识库检索的事实性问题
- knowledge_graph: 涉及实体关系、「谁是谁」、「之间有什么联系」的问题
- web_search: 需要最新信息、实时数据的问题
- direct_llm: 通用知识、不需要外部检索的问题"""),
    ("human", "用户问题:{question}")
])

router_chain = router_prompt | llm.with_structured_output(RouteDecision)

# 路由使用
questions = [
    "公司的退款政策是什么?",           # → vectorstore
    "OpenAI和Google在AI领域的竞争关系?", # → knowledge_graph
    "今天的股票行情怎样?",              # → web_search
    "什么是快速排序算法?",              # → direct_llm
]

for q in questions:
    decision = router_chain.invoke({"question": q})
    print(f"Q: {q}")
    print(f"→ {decision.datasource} ({decision.reasoning})\n")

18.3.3 Self-RAG(自我反思检索)

Self-RAG通过"反思标记"让模型决定是否需要检索、检索结果是否相关、生成是否有依据。

Python
"""Self-RAG实现"""
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

llm = ChatOpenAI(model="gpt-4o", temperature=0)

class RetrievalDecision(BaseModel):
    """是否需要检索"""
    need_retrieval: bool = Field(description="是否需要外部检索")
    reason: str = Field(description="原因")

class RelevanceCheck(BaseModel):
    """检索结果相关性"""
    is_relevant: bool = Field(description="检索结果是否与问题相关")
    relevant_parts: str = Field(description="相关部分摘要")

class GroundednessCheck(BaseModel):
    """生成结果的依据性"""
    is_grounded: bool = Field(description="回答是否有检索内容支撑")
    score: float = Field(description="依据性评分0-1")

def self_rag(question: str, retriever) -> str:
    """Self-RAG完整流程"""

    # Step 1: 判断是否需要检索
    decision = (
        ChatPromptTemplate.from_template("问题:{question}\n是否需要外部检索来回答?")
        | llm.with_structured_output(RetrievalDecision)
    ).invoke({"question": question})

    if not decision.need_retrieval:
        # 直接生成
        return llm.invoke(f"请回答:{question}").content

    # Step 2: 检索
    docs = retriever.invoke(question)
    context = "\n".join([doc.page_content for doc in docs])

    # Step 3: 检查检索结果相关性
    relevance = (
        ChatPromptTemplate.from_template(
            "问题:{question}\n检索内容:{context}\n检索内容是否与问题相关?"
        )
        | llm.with_structured_output(RelevanceCheck)
    ).invoke({"question": question, "context": context})

    if not relevance.is_relevant:
        # 检索不相关,重新检索或直接生成
        return llm.invoke(f"请回答(注意:没有找到相关参考资料):{question}").content

    # Step 4: 基于检索内容生成回答
    answer = llm.invoke(
        f"基于以下内容回答问题:\n\n参考内容:{context}\n\n问题:{question}"
    ).content

    # Step 5: 检查生成结果的依据性
    groundedness = (
        ChatPromptTemplate.from_template(
            "参考内容:{context}\n生成回答:{answer}\n回答是否有参考内容支撑?"
        )
        | llm.with_structured_output(GroundednessCheck)
    ).invoke({"context": context, "answer": answer})

    if not groundedness.is_grounded or groundedness.score < 0.7:
        # 依据性不足,重新生成
        answer = llm.invoke(
            f"请严格基于以下内容回答,不要加入额外信息:\n{context}\n\n问题:{question}"
        ).content

    return answer

18.3.4 CRAG(Corrective RAG)

Python
"""CRAG: 修正性RAG"""
from typing import Literal
from pydantic import BaseModel, Field

class DocumentGrade(BaseModel):
    """文档评分"""
    grade: Literal["relevant", "ambiguous", "irrelevant"] = Field(
        description="文档与问题的相关程度"
    )

def corrective_rag(question: str, retriever, llm) -> str:
    """
    CRAG流程:
    1. 检索文档
    2. 评估每个文档的相关性
    3. 根据评估结果采取不同策略:
       - 所有相关 → 直接生成
       - 部分相关 → 补充网络搜索
       - 都不相关 → 完全使用网络搜索
    """
    # Step 1: 初始检索
    docs = retriever.invoke(question)

    # Step 2: 文档评分
    grader_prompt = ChatPromptTemplate.from_template(
        "问题:{question}\n文档:{document}\n请评估该文档与问题的相关程度。"
    )
    grader = grader_prompt | llm.with_structured_output(DocumentGrade)

    relevant_docs = []
    for doc in docs:
        grade = grader.invoke({"question": question, "document": doc.page_content})
        if grade.grade == "relevant":
            relevant_docs.append(doc)

    # Step 3: 根据评分采取策略
    relevance_ratio = len(relevant_docs) / len(docs) if docs else 0

    if relevance_ratio >= 0.8:
        # 高相关:直接使用检索结果
        context = "\n".join([d.page_content for d in relevant_docs])
        action = "direct_generate"
    elif relevance_ratio >= 0.3:
        # 部分相关:补充网络搜索
        context = "\n".join([d.page_content for d in relevant_docs])
        web_results = web_search(question)  # 补充搜索
        context += f"\n\n补充信息:{web_results}"
        action = "augmented_generate"
    else:
        # 低相关:完全依赖网络搜索
        context = web_search(question)
        action = "web_generate"

    # Step 4: 生成回答
    answer = llm.invoke(
        f"[策略: {action}]\n基于以下内容回答:\n{context}\n\n问题:{question}"
    ).content

    return answer

def web_search(query: str) -> str:
    """模拟网络搜索"""
    return f"网络搜索'{query}'的结果..."

18.4 高级检索策略

18.4.1 多向量检索(Multi-Vector Retrieval)

Python
"""多向量检索:父文档检索策略"""
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
import uuid

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(collection_name="multi_vector", embedding_function=embeddings)
doc_store = InMemoryStore()  # 存储原始大文档(InMemoryStore支持Document对象)

retriever = MultiVectorRetriever(
    vectorstore=vectorstore,
    byte_store=doc_store,
    id_key="doc_id"
)

# 策略1: 父文档+子chunk
# 用小chunk检索,返回大的父文档
parent_docs = [
    Document(page_content="这是一篇很长的文章..." * 100, metadata={"source": "doc1"})
]

child_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)

for parent_doc in parent_docs:
    doc_id = str(uuid.uuid4())
    child_docs = child_splitter.split_documents([parent_doc])
    for child in child_docs:
        child.metadata["doc_id"] = doc_id

    retriever.vectorstore.add_documents(child_docs)
    retriever.docstore.mset([(doc_id, parent_doc)])  # 存储父文档

# 策略2: 文档摘要检索
# 用摘要的embedding检索,返回原始文档
llm = ChatOpenAI(model="gpt-4o")

for doc in parent_docs:
    doc_id = str(uuid.uuid4())
    summary = llm.invoke(f"请用一句话总结:{doc.page_content[:1000]}").content
    summary_doc = Document(page_content=summary, metadata={"doc_id": doc_id})

    retriever.vectorstore.add_documents([summary_doc])
    retriever.docstore.mset([(doc_id, doc)])

18.4.2 混合检索(Hybrid Retrieval)

Python
"""混合检索:Dense + Sparse + Reranking"""
from langchain_community.retrievers import BM25Retriever
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.retrievers import EnsembleRetriever
from langchain_core.documents import Document

# 准备文档
docs = [
    Document(page_content="LangChain是一个用于构建LLM应用的框架"),
    Document(page_content="向量数据库用于存储和检索嵌入向量"),
    Document(page_content="RAG是检索增强生成的缩写"),
    Document(page_content="Transformer架构是现代大模型的基础"),
]

# Sparse检索器(BM25关键词匹配)
bm25_retriever = BM25Retriever.from_documents(docs, k=3)

# Dense检索器(向量语义检索)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(docs, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 混合检索(加权融合)
hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, vector_retriever],
    weights=[0.4, 0.6]  # BM25权重40%, 向量权重60%
)

results = hybrid_retriever.invoke("什么是RAG?")

# Reranking(使用BGE-reranker或Cohere Reranker)
# pip install langchain-cohere
# from langchain_cohere import CohereRerank
# from langchain.retrievers import ContextualCompressionRetriever

# reranker = CohereRerank(model="rerank-v3.5", top_n=3)
# reranking_retriever = ContextualCompressionRetriever(
#     base_compressor=reranker,
#     base_retriever=hybrid_retriever
# )

18.4.3 查询转换

Python
"""查询转换策略"""
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field

llm = ChatOpenAI(model="gpt-4o", temperature=0)

# --- HyDE (Hypothetical Document Embedding) ---
def hyde_retrieval(query: str, retriever):
    """
    HyDE: 先让LLM生成假设性答案文档,
    用该文档的embedding去检索(比原始query效果更好)
    """
    hyde_prompt = ChatPromptTemplate.from_template(
        "请对以下问题写一段可能的答案(不需要保证准确):\n{query}"
    )
    hypothetical_doc = (hyde_prompt | llm).invoke({"query": query}).content

    # 用假设文档的embedding进行检索
    results = retriever.invoke(hypothetical_doc)
    return results

# --- Multi-Query(多查询扩展)---
class MultiQuery(BaseModel):
    queries: list[str] = Field(description="从不同角度重写的查询列表")

def multi_query_retrieval(query: str, retriever):
    """
    Multi-Query: 将原始查询改写为多个不同角度的查询,
    合并检索结果,增加召回率
    """
    expand_prompt = ChatPromptTemplate.from_template(
        """请将以下问题从3个不同角度进行改写,以提高检索召回率:
原始问题:{query}
要求:保持语义一致,但使用不同的表述方式。"""
    )

    multi = (expand_prompt | llm.with_structured_output(MultiQuery)).invoke({"query": query})

    # 对每个查询检索并去重
    all_docs = []
    seen_contents = set()
    for q in [query] + multi.queries:
        docs = retriever.invoke(q)
        for doc in docs:
            if doc.page_content not in seen_contents:
                all_docs.append(doc)
                seen_contents.add(doc.page_content)

    return all_docs

# --- Step-Back Prompting ---
def step_back_retrieval(query: str, retriever):
    """
    Step-Back: 先退一步问更高层次的问题,
    获取背景知识后再回答原始问题
    """
    step_back_prompt = ChatPromptTemplate.from_template(
        """请生成一个更高层次、更抽象的问题,帮助回答原始问题:
原始问题:{query}
更高层次的问题:"""
    )

    abstract_query = (step_back_prompt | llm).invoke({"query": query}).content

    # 同时检索原始问题和抽象问题
    original_docs = retriever.invoke(query)
    abstract_docs = retriever.invoke(abstract_query)

    return original_docs + abstract_docs

18.4.4 上下文压缩

Python
"""上下文压缩:减少噪声,提取关键信息"""
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# LLM提取压缩器:只提取与查询相关的部分
compressor = LLMChainExtractor.from_llm(llm)

# base_retriever 是任意检索器
# compressed_retriever = ContextualCompressionRetriever(
#     base_compressor=compressor,
#     base_retriever=base_retriever
# )

# LongContext Reorder(长上下文重排序)
# 研究表明LLM更关注开头和结尾的内容
from langchain_community.document_transformers import LongContextReorder

reordering = LongContextReorder()
# reordered_docs = reordering.transform_documents(docs)
# 将最相关的文档放在开头和结尾

18.5 RAG评估框架

18.5.1 RAGAS评估指标

Python
"""RAGAS评估框架"""
# pip install ragas

from ragas import evaluate
from ragas.metrics import (
    faithfulness,          # 忠实度:回答是否基于检索内容
    answer_relevancy,      # 答案相关性:回答是否切题
    context_precision,     # 上下文精度:检索内容是否精准
    context_recall,        # 上下文召回:是否检索到所有需要的信息
    answer_correctness     # 答案正确性:与标准答案的吻合度
)
from datasets import Dataset

# 准备评估数据
eval_data = {
    "question": [
        "LangChain是什么?",
        "RAG的核心流程是什么?"
    ],
    "answer": [
        "LangChain是一个用于构建LLM应用的开源框架",
        "RAG的流程包括检索相关文档和基于文档生成回答"
    ],
    "contexts": [
        ["LangChain是一个流行的开源框架,专为构建大语言模型应用而设计"],
        ["RAG(检索增强生成)的核心流程:1.检索 2.增强 3.生成"]
    ],
    "ground_truth": [
        "LangChain是用于开发LLM驱动应用的开源框架",
        "RAG包括文档检索、上下文增强和LLM生成三个步骤"
    ]
}

dataset = Dataset.from_dict(eval_data)

# 执行评估
results = evaluate(
    dataset=dataset,
    metrics=[
        faithfulness,
        answer_relevancy,
        context_precision,
        context_recall,
        answer_correctness
    ]
)

print(results)
# 输出各指标评分 (0-1)

RAGAS核心指标解读

指标 含义 评估什么 理想值
Faithfulness 忠实度 回答是否有检索依据(无hallucination) >0.9
Answer Relevancy 答案相关性 回答是否切题 >0.85
Context Precision 上下文精度 检索内容中有用信息占比 >0.8
Context Recall 上下文召回 需要的信息是否都检索到了 >0.8
Answer Correctness 答案正确性 与标准答案的吻合度 >0.85

18.5.2 自建评估Pipeline

Python
"""自建RAG评估Pipeline"""
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import Literal
import json

class RAGEvalResult(BaseModel):
    """RAG评估结果"""
    faithfulness_score: float = Field(description="忠实度(0-1)")
    relevance_score: float = Field(description="相关性(0-1)")
    completeness_score: float = Field(description="完整性(0-1)")
    overall_score: float = Field(description="综合评分(0-1)")
    explanation: str = Field(description="评价说明")

llm = ChatOpenAI(model="gpt-4o", temperature=0)

def evaluate_rag_response(
    question: str,
    context: str,
    answer: str,
    ground_truth: str = ""
) -> RAGEvalResult:
    """评估RAG系统的单次回答"""

    eval_prompt = f"""请评估以下RAG系统的回答质量:

## 问题
{question}

## 检索到的上下文
{context}

## 系统回答
{answer}

## 参考答案(如有)
{ground_truth or "无"}

评分标准:
- faithfulness_score: 回答是否完全基于上下文,无编造(0-1)
- relevance_score: 回答是否与问题相关(0-1)
- completeness_score: 回答是否完整覆盖了问题要求(0-1)
- overall_score: 综合评分(0-1)"""

    result = llm.with_structured_output(RAGEvalResult).invoke(eval_prompt)
    return result

# 批量评估
def batch_evaluate(test_cases: list[dict]) -> dict:
    """批量评估RAG系统"""
    results = []
    for case in test_cases:
        result = evaluate_rag_response(
            question=case["question"],
            context=case["context"],
            answer=case["answer"],
            ground_truth=case.get("ground_truth", "")
        )
        results.append(result)

    # 汇总统计
    avg_scores = {
        "avg_faithfulness": sum(r.faithfulness_score for r in results) / len(results),
        "avg_relevance": sum(r.relevance_score for r in results) / len(results),
        "avg_completeness": sum(r.completeness_score for r in results) / len(results),
        "avg_overall": sum(r.overall_score for r in results) / len(results),
    }

    return avg_scores

18.6 生产环境RAG优化

18.6.1 Chunk策略

Python
"""高级分块策略"""
from langchain_text_splitters import (
    RecursiveCharacterTextSplitter,
    MarkdownHeaderTextSplitter,
)
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings

# 策略1: 语义分割(根据语义相似度动态分块)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
semantic_splitter = SemanticChunker(
    embeddings,
    breakpoint_threshold_type="percentile",  # percentile/standard_deviation/interquartile
    breakpoint_threshold_amount=95
)
# semantic_chunks = semantic_splitter.split_documents(docs)

# 策略2: Markdown标题分割(保留结构信息)
md_splitter = MarkdownHeaderTextSplitter(
    headers_to_split_on=[
        ("#", "Header 1"),
        ("##", "Header 2"),
        ("###", "Header 3"),
    ]
)
# md_chunks = md_splitter.split_text(markdown_text)

# 策略3: 递归分割 + 元数据增强
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    separators=["\n\n", "\n", "。", "!", "?", ";", " "]
)

18.6.2 元数据过滤与缓存策略

Python
"""元数据过滤与语义缓存"""

# 元数据过滤检索
# vectorstore.as_retriever(
#     search_kwargs={
#         "k": 5,
#         "filter": {
#             "source": "technical_doc",
#             "date": {"$gte": "2024-01-01"},
#             "department": "engineering"
#         }
#     }
# )

# 语义缓存(GPTCache / 自建)
from langchain_openai import OpenAIEmbeddings
import numpy as np
from functools import lru_cache
import hashlib

class SemanticCache:
    """简易语义缓存"""

    def __init__(self, embeddings, threshold: float = 0.95):
        self.embeddings = embeddings
        self.threshold = threshold
        self.cache: dict[str, dict] = {}  # key: hash, value: {embedding, response}

    def _get_embedding(self, text: str) -> list[float]:
        return self.embeddings.embed_query(text)

    def _cosine_similarity(self, a: list[float], b: list[float]) -> float:
        a, b = np.array(a), np.array(b)
        return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

    def get(self, query: str) -> str | None:
        """查找语义相似的缓存"""
        query_emb = self._get_embedding(query)
        for key, item in self.cache.items():
            sim = self._cosine_similarity(query_emb, item["embedding"])
            if sim >= self.threshold:
                return item["response"]
        return None

    def set(self, query: str, response: str):
        """添加缓存"""
        key = hashlib.md5(query.encode()).hexdigest()
        self.cache[key] = {
            "embedding": self._get_embedding(query),
            "response": response
        }

    def rag_with_cache(self, query: str, rag_chain) -> str:
        """带缓存的RAG调用"""
        cached = self.get(query)
        if cached:
            print("🎯 缓存命中")
            return cached

        response = rag_chain.invoke(query)
        self.set(query, response)
        return response

18.7 完整代码示例:高级RAG管线

Python
"""完整示例:使用LangChain构建高级RAG管线"""
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever, ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from pydantic import BaseModel, Field
from typing import Literal

# ============ 配置 ============
llm = ChatOpenAI(model="gpt-4o", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# ============ 文档准备 ============
sample_docs = [
    Document(page_content="LangChain是一个开源框架,用于构建基于大语言模型的应用。它提供了链、Agent、RAG等核心组件。", metadata={"source": "langchain_intro", "category": "framework"}),
    Document(page_content="LlamaIndex专注于RAG场景,提供了Documents→Nodes→Index→QueryEngine的标准管线。", metadata={"source": "llamaindex_intro", "category": "framework"}),
    Document(page_content="向量数据库如Chroma、Pinecone、Weaviate用于存储和检索文档的嵌入向量。", metadata={"source": "vector_db", "category": "infrastructure"}),
    Document(page_content="RAG(检索增强生成)通过检索外部知识来增强LLM的回答质量,减少幻觉。", metadata={"source": "rag_intro", "category": "technique"}),
    Document(page_content="RAGAS是一个RAG评估框架,提供faithfulness、relevancy、precision等指标。", metadata={"source": "ragas_intro", "category": "evaluation"}),
]

# ============ 构建检索器 ============
# Dense检索
vectorstore = Chroma.from_documents(sample_docs, embeddings)
dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Sparse检索(BM25)
bm25_retriever = BM25Retriever.from_documents(sample_docs, k=3)

# 混合检索
hybrid_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, dense_retriever],
    weights=[0.4, 0.6]
)

# 上下文压缩
compressor = LLMChainExtractor.from_llm(ChatOpenAI(model="gpt-4o-mini"))
compressed_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=hybrid_retriever
)

# ============ 查询路由 ============
class RouteDecision(BaseModel):
    route: Literal["simple", "advanced"] = Field(description="查询路由")

router_prompt = ChatPromptTemplate.from_template(
    "判断问题复杂度:简单事实问题选simple,需要综合分析选advanced。\n问题:{question}"
)
router = router_prompt | llm.with_structured_output(RouteDecision)

# ============ 生成链 ============
rag_prompt = ChatPromptTemplate.from_template("""
基于以下检索内容回答问题。如果内容不足以回答,请说明。

检索内容:
{context}

问题:{question}

要求:
1. 回答要准确,不要编造
2. 引用信息来源
3. 如果不确定,坦诚说明
""")

def format_docs(docs):
    return "\n\n".join(f"[{doc.metadata.get('source', 'unknown')}] {doc.page_content}" for doc in docs)

def route_and_retrieve(question: str):
    """路由 + 检索"""
    decision = router.invoke({"question": question})
    if decision.route == "simple":
        docs = hybrid_retriever.invoke(question)
    else:
        docs = compressed_retriever.invoke(question)
    return format_docs(docs)

# 完整RAG链
advanced_rag_chain = (
    {
        "context": RunnableLambda(lambda x: route_and_retrieve(x["question"])),  # lambda匿名函数
        "question": RunnablePassthrough() | RunnableLambda(lambda x: x["question"])
    }
    | rag_prompt
    | llm
    | StrOutputParser()
)

# ============ 执行 ============
answer = advanced_rag_chain.invoke({"question": "什么是RAG?它有什么优势?"})
print(answer)

📋 面试要点

高频面试题

Q1: 解释Naive RAG、Advanced RAG、Modular RAG和Agentic RAG的区别?

答:Naive RAG是简单的检索+生成管线;Advanced RAG加入预检索优化(查询重写)和后检索优化(重排序);Modular RAG将各环节模块化,可灵活组合;Agentic RAG由Agent动态决定检索策略,支持多步推理和自我反思。演进方向是从静态管线到智能自适应。

Q2: GraphRAG的Global Search和Local Search有什么区别?

答:Global Search使用社区摘要进行Map-Reduce,适合全局性问题(如"数据集的主要主题")。Local Search结合向量检索和图谱邻居遍历,适合具体问题(如"某实体的属性")。Global Search成本更高但能回答跨文档问题。

Q3: 什么是Self-RAG?与传统RAG有何不同?

答:Self-RAG引入"反思标记",让模型在检索前判断是否需要检索,检索后评估相关性,生成后检查依据性。与传统RAG的固定管线不同,Self-RAG是自适应的,可减少不必要检索和hallucination。

Q4: 如何评估RAG系统?RAGAS框架有哪些核心指标?

答:RAGAS核心指标:①Faithfulness(忠实度,回答是否有依据)②Answer Relevancy(答案相关性)③Context Precision(检索精度)④Context Recall(检索召回率)。评估既需自动指标也需人工评估,生产环境建议持续监控这些指标。

Q5: 混合检索为什么优于纯向量检索?

答:向量检索擅长语义匹配但可能错过关键词精确匹配;BM25擅长关键词匹配但不理解语义。混合检索(如BM25 0.4 + Dense 0.6)结合两者优势,再通过Reranker二次排序,显著提升检索质量。实践中Top-5相关性可提升10-20%。


✏️ 练习

练习1:GraphRAG实践

使用LangChain + LLM构建一个简单的知识图谱,实现从一段文本中提取实体和关系,并进行基于图谱的问答。

练习2:Self-RAG实现

实现Self-RAG的完整流程,包括:检索必要性判断→文档检索→相关性评估→生成→依据性检查。对比Self-RAG与Naive RAG在事实性问题上的表现。

练习3:混合检索系统

使用BM25 + 向量检索 + Cohere Reranker构建混合检索系统,对比不同权重配置下的检索效果。

练习4:RAG评估

使用RAGAS框架对你构建的RAG系统进行评估,分析Faithfulness和Relevancy指标,找出系统瓶颈并提出优化方案。


📚 参考资料 - GraphRAG论文: "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" (2024) - Self-RAG论文: "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" (2024) - CRAG论文: "Corrective Retrieval Augmented Generation" (2024) - RAGAS文档 - LangChain RAG教程


最后更新日期:2026-02-12 适用版本:LLM应用指南 v2026