高级RAG技术¶
⚠️ 时效性说明:本章涉及前沿模型/价格/榜单等信息,可能随版本快速变化;请以论文原文、官方发布页和 API 文档为准。
📌 定位说明:本章侧重高级RAG实战(GraphRAG、Agentic RAG等)。 - 📖 RAG的NLP理论基础(检索原理、Embedding技术、评估方法)请参考 自然语言处理/14-RAG系统设计 - 📖 RAG研究前沿与长文本技术请参考 LLM学习/04-前沿探索/03-RAG与长文本
📌 RAG从简单的"检索+生成"演进为融合知识图谱、Agent自适应检索、多步推理的复杂系统。掌握GraphRAG、Agentic RAG、高级检索策略与评估方法,是构建生产级RAG系统的核心竞争力。
前置知识: RAG系统构建(第5章)、向量数据库(第6章)、LangChain框架(第8章)
🎯 学习目标¶
- 理解RAG技术从Naive到Agentic的演进脉络
- 掌握GraphRAG的知识图谱构建与检索机制
- 理解Agentic RAG的自适应检索策略(Self-RAG、CRAG)
- 熟练使用高级检索策略(混合检索、查询转换、上下文压缩)
- 掌握RAG评估框架(RAGAS、TruLens)的使用方法
- 能够设计和实现生产级高级RAG管线
- 掌握RAG相关面试高频考点
18.1 RAG技术演进¶
18.1.1 四代RAG架构¶
┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────────┐
│ Naive RAG │───>│ Advanced RAG │───>│ Modular RAG │───>│ Agentic RAG │
│ (基础RAG) │ │ (高级RAG) │ │ (模块化RAG) │ │ (智能体RAG) │
└─────────────┘ └──────────────┘ └─────────────┘ └──────────────┘
2023初 2023中 2024初 2024-2025
| 阶段 | 核心特征 | 代表技术 | 局限性 |
|---|---|---|---|
| Naive RAG | 简单的检索-生成管线 | 基础向量检索 + LLM生成 | 检索质量差、缺乏推理 |
| Advanced RAG | 预检索/后检索优化 | 查询重写、重排序、HyDE | 静态管线、缺乏自适应 |
| Modular RAG | 模块化、可组合 | 插件式模块、路由、融合 | 需人工设计管线 |
| Agentic RAG | Agent驱动的自适应检索 | Self-RAG、CRAG、多步推理 | 延迟增加、成本上升 |
18.1.2 Naive RAG的核心问题¶
"""Naive RAG:基础检索-生成管线"""
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
# 1. 文档加载与分块
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
# docs = text_splitter.split_documents(raw_docs)
# 2. 向量化与存储
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# vectorstore = Chroma.from_documents(docs, embeddings)
# retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
# 3. 生成
llm = ChatOpenAI(model="gpt-4o")
prompt = ChatPromptTemplate.from_template("""
基于以下上下文回答问题。如果上下文不包含答案,请说明。
上下文:{context}
问题:{question}
答案:""")
# chain = (
# {"context": retriever | format_docs, "question": RunnablePassthrough()}
# | prompt
# | llm
# | StrOutputParser()
# )
Naive RAG的核心问题: 1. 检索粒度粗:固定chunk size丢失上下文 2. 语义鸿沟:查询和文档embedding不总是对齐 3. 无关信息干扰:检索到的内容可能包含噪声 4. 缺乏推理:无法处理需要多步推理的问题 5. 无自适应:不论问题复杂度都用相同管线
18.2 GraphRAG(微软)¶
18.2.1 GraphRAG核心思想¶
GraphRAG由微软研究院提出,将知识图谱与RAG结合,特别擅长处理需要全局理解的问题。
核心流程:
与传统RAG的对比:
| 特性 | 传统向量RAG | GraphRAG |
|---|---|---|
| 数据结构 | 向量数据库 | 知识图谱 + 社区摘要 |
| 检索方式 | 相似度检索 | 图遍历 + 社区索引 |
| 全局问题 | 效果差 | 效果好 |
| 局部问题 | 效果好 | 效果好 |
| 构建成本 | 低 | 高(大量LLM调用) |
| 更新成本 | 低 | 中高 |
18.2.2 知识图谱构建¶
"""GraphRAG知识图谱构建(使用graphrag库)"""
# pip install graphrag
# 方式1: 使用微软graphrag CLI
# graphrag init --root ./ragtest
# graphrag index --root ./ragtest
# 方式2: 使用LangChain + LLM构建知识图谱
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
class Entity(BaseModel): # Pydantic BaseModel:自动数据验证和序列化
name: str = Field(description="实体名称")
type: str = Field(description="实体类型,如:人物/组织/技术/概念")
description: str = Field(description="实体描述")
class Relationship(BaseModel):
source: str = Field(description="源实体")
target: str = Field(description="目标实体")
relation: str = Field(description="关系类型")
description: str = Field(description="关系描述")
class KnowledgeGraph(BaseModel):
entities: list[Entity] = Field(description="实体列表")
relationships: list[Relationship] = Field(description="关系列表")
llm = ChatOpenAI(model="gpt-4o", temperature=0)
kg_prompt = ChatPromptTemplate.from_messages([
("system", """你是一个知识图谱构建专家。从给定文本中提取实体和关系。
要求:
1. 提取所有重要实体(人物、组织、技术、概念等)
2. 提取实体间的关系
3. 每个实体和关系都需要有描述"""),
("human", "请从以下文本中提取知识图谱:\n\n{text}")
])
kg_chain = kg_prompt | llm.with_structured_output(KnowledgeGraph)
# 示例文本
text = """
OpenAI于2024年发布了GPT-4o模型,支持多模态输入输出。
Sam Altman是OpenAI的CEO,他推动了ChatGPT的商业化。
Google DeepMind发布了Gemini模型与GPT-4o竞争。
LangChain是一个流行的LLM应用开发框架,支持RAG和Agent开发。
"""
kg = kg_chain.invoke({"text": text})
print("实体数量:", len(kg.entities))
for entity in kg.entities:
print(f" [{entity.type}] {entity.name}: {entity.description}")
print("\n关系数量:", len(kg.relationships))
for rel in kg.relationships:
print(f" {rel.source} --[{rel.relation}]--> {rel.target}")
18.2.3 社区检测与摘要(Leiden算法)¶
"""社区检测与社区摘要"""
import networkx as nx
# pip install graspologic
from graspologic.partition import hierarchical_leiden
def build_graph_and_detect_communities(kg: KnowledgeGraph):
"""构建NetworkX图并进行社区检测"""
G = nx.Graph()
# 添加节点
for entity in kg.entities:
G.add_node(entity.name, type=entity.type, description=entity.description)
# 添加边
for rel in kg.relationships:
G.add_edge(rel.source, rel.target, relation=rel.relation, description=rel.description)
# Leiden算法社区检测
# hierarchical_leiden 返回 HierarchicalCluster 列表,每项有 node, cluster, level 等属性
community_mapping = hierarchical_leiden(G, max_cluster_size=10)
# 社区分组
communities = {}
for item in community_mapping:
community_id = item.cluster
node_id = item.node
if community_id not in communities:
communities[community_id] = []
communities[community_id].append(node_id)
return G, communities
def generate_community_summaries(communities: dict, G: nx.Graph, llm) -> dict:
"""为每个社区生成摘要"""
summaries = {}
for community_id, nodes in communities.items():
# 收集社区内的实体和关系信息
community_info = []
for node in nodes:
node_data = G.nodes[node]
community_info.append(f"实体: {node} ({node_data.get('type', 'unknown')})")
for neighbor in G.neighbors(node):
if neighbor in nodes:
edge_data = G.edges[node, neighbor]
community_info.append(f" 关系: {node} --{edge_data.get('relation', '')}--> {neighbor}")
# LLM生成社区摘要
info_text = "\n".join(community_info)
summary = llm.invoke(f"请为以下知识图谱社区生成一段摘要:\n{info_text}")
summaries[community_id] = summary.content
return summaries
18.2.4 Global Search vs Local Search¶
"""GraphRAG查询模式"""
class GraphRAGQueryEngine:
"""GraphRAG查询引擎"""
def __init__(self, community_summaries, graph, vectorstore, llm):
self.community_summaries = community_summaries
self.graph = graph
self.vectorstore = vectorstore
self.llm = llm
def global_search(self, query: str) -> str:
"""
Global Search:适合全局性/总结性问题
策略:使用社区摘要进行Map-Reduce
例如:"数据集中有哪些主要主题?"
"""
# Map阶段:对每个社区摘要生成部分答案
partial_answers = []
for community_id, summary in self.community_summaries.items():
response = self.llm.invoke(
f"基于以下社区信息回答问题(如无关则回复N/A):\n"
f"社区信息:{summary}\n问题:{query}"
)
if "N/A" not in response.content:
partial_answers.append(response.content)
# Reduce阶段:合并部分答案
combined = "\n\n".join(partial_answers)
final = self.llm.invoke(
f"请综合以下信息回答问题:\n\n{combined}\n\n问题:{query}"
)
return final.content
def local_search(self, query: str, top_k: int = 5) -> str:
"""
Local Search:适合具体/局部问题
策略:结合向量检索 + 图谱邻居遍历
例如:"OpenAI的CEO是谁?"
"""
# 向量检索相关实体
relevant_docs = self.vectorstore.similarity_search(query, k=top_k)
# 图扩展:获取相关实体的邻居信息
context_parts = []
for doc in relevant_docs:
entity_name = doc.metadata.get("entity_name", "")
if entity_name in self.graph:
neighbors = list(self.graph.neighbors(entity_name))
for neighbor in neighbors:
edge_data = self.graph.edges[entity_name, neighbor]
context_parts.append(
f"{entity_name} --[{edge_data.get('relation', '')}]--> {neighbor}"
)
context = "\n".join(context_parts)
response = self.llm.invoke(
f"基于以下知识图谱信息回答问题:\n{context}\n\n问题:{query}"
)
return response.content
18.3 Agentic RAG¶
18.3.1 Agent驱动的自适应检索¶
Agentic RAG让Agent根据查询特点动态决定检索策略,而非使用固定管线。
用户查询 → Agent判断 ─→ 直接用LLM回答(简单问题)
├→ 向量检索(事实性问题)
├→ 图谱检索(关系性问题)
├→ SQL查询(结构化数据)
└→ 多步检索(复杂问题)
18.3.2 查询路由(Router)¶
"""查询路由器:根据问题类型选择检索策略"""
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
from typing import Literal
class RouteDecision(BaseModel):
"""路由决策"""
datasource: Literal["vectorstore", "knowledge_graph", "web_search", "direct_llm"] = Field( # Literal限定取值范围
description="选择数据源"
)
reasoning: str = Field(description="选择理由")
llm = ChatOpenAI(model="gpt-4o", temperature=0)
router_prompt = ChatPromptTemplate.from_messages([
("system", """你是一个查询路由专家。根据用户问题选择最佳数据源:
- vectorstore: 需要从企业知识库检索的事实性问题
- knowledge_graph: 涉及实体关系、「谁是谁」、「之间有什么联系」的问题
- web_search: 需要最新信息、实时数据的问题
- direct_llm: 通用知识、不需要外部检索的问题"""),
("human", "用户问题:{question}")
])
router_chain = router_prompt | llm.with_structured_output(RouteDecision)
# 路由使用
questions = [
"公司的退款政策是什么?", # → vectorstore
"OpenAI和Google在AI领域的竞争关系?", # → knowledge_graph
"今天的股票行情怎样?", # → web_search
"什么是快速排序算法?", # → direct_llm
]
for q in questions:
decision = router_chain.invoke({"question": q})
print(f"Q: {q}")
print(f"→ {decision.datasource} ({decision.reasoning})\n")
18.3.3 Self-RAG(自我反思检索)¶
Self-RAG通过"反思标记"让模型决定是否需要检索、检索结果是否相关、生成是否有依据。
"""Self-RAG实现"""
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
llm = ChatOpenAI(model="gpt-4o", temperature=0)
class RetrievalDecision(BaseModel):
"""是否需要检索"""
need_retrieval: bool = Field(description="是否需要外部检索")
reason: str = Field(description="原因")
class RelevanceCheck(BaseModel):
"""检索结果相关性"""
is_relevant: bool = Field(description="检索结果是否与问题相关")
relevant_parts: str = Field(description="相关部分摘要")
class GroundednessCheck(BaseModel):
"""生成结果的依据性"""
is_grounded: bool = Field(description="回答是否有检索内容支撑")
score: float = Field(description="依据性评分0-1")
def self_rag(question: str, retriever) -> str:
"""Self-RAG完整流程"""
# Step 1: 判断是否需要检索
decision = (
ChatPromptTemplate.from_template("问题:{question}\n是否需要外部检索来回答?")
| llm.with_structured_output(RetrievalDecision)
).invoke({"question": question})
if not decision.need_retrieval:
# 直接生成
return llm.invoke(f"请回答:{question}").content
# Step 2: 检索
docs = retriever.invoke(question)
context = "\n".join([doc.page_content for doc in docs])
# Step 3: 检查检索结果相关性
relevance = (
ChatPromptTemplate.from_template(
"问题:{question}\n检索内容:{context}\n检索内容是否与问题相关?"
)
| llm.with_structured_output(RelevanceCheck)
).invoke({"question": question, "context": context})
if not relevance.is_relevant:
# 检索不相关,重新检索或直接生成
return llm.invoke(f"请回答(注意:没有找到相关参考资料):{question}").content
# Step 4: 基于检索内容生成回答
answer = llm.invoke(
f"基于以下内容回答问题:\n\n参考内容:{context}\n\n问题:{question}"
).content
# Step 5: 检查生成结果的依据性
groundedness = (
ChatPromptTemplate.from_template(
"参考内容:{context}\n生成回答:{answer}\n回答是否有参考内容支撑?"
)
| llm.with_structured_output(GroundednessCheck)
).invoke({"context": context, "answer": answer})
if not groundedness.is_grounded or groundedness.score < 0.7:
# 依据性不足,重新生成
answer = llm.invoke(
f"请严格基于以下内容回答,不要加入额外信息:\n{context}\n\n问题:{question}"
).content
return answer
18.3.4 CRAG(Corrective RAG)¶
"""CRAG: 修正性RAG"""
from typing import Literal
from pydantic import BaseModel, Field
class DocumentGrade(BaseModel):
"""文档评分"""
grade: Literal["relevant", "ambiguous", "irrelevant"] = Field(
description="文档与问题的相关程度"
)
def corrective_rag(question: str, retriever, llm) -> str:
"""
CRAG流程:
1. 检索文档
2. 评估每个文档的相关性
3. 根据评估结果采取不同策略:
- 所有相关 → 直接生成
- 部分相关 → 补充网络搜索
- 都不相关 → 完全使用网络搜索
"""
# Step 1: 初始检索
docs = retriever.invoke(question)
# Step 2: 文档评分
grader_prompt = ChatPromptTemplate.from_template(
"问题:{question}\n文档:{document}\n请评估该文档与问题的相关程度。"
)
grader = grader_prompt | llm.with_structured_output(DocumentGrade)
relevant_docs = []
for doc in docs:
grade = grader.invoke({"question": question, "document": doc.page_content})
if grade.grade == "relevant":
relevant_docs.append(doc)
# Step 3: 根据评分采取策略
relevance_ratio = len(relevant_docs) / len(docs) if docs else 0
if relevance_ratio >= 0.8:
# 高相关:直接使用检索结果
context = "\n".join([d.page_content for d in relevant_docs])
action = "direct_generate"
elif relevance_ratio >= 0.3:
# 部分相关:补充网络搜索
context = "\n".join([d.page_content for d in relevant_docs])
web_results = web_search(question) # 补充搜索
context += f"\n\n补充信息:{web_results}"
action = "augmented_generate"
else:
# 低相关:完全依赖网络搜索
context = web_search(question)
action = "web_generate"
# Step 4: 生成回答
answer = llm.invoke(
f"[策略: {action}]\n基于以下内容回答:\n{context}\n\n问题:{question}"
).content
return answer
def web_search(query: str) -> str:
"""模拟网络搜索"""
return f"网络搜索'{query}'的结果..."
18.4 高级检索策略¶
18.4.1 多向量检索(Multi-Vector Retrieval)¶
"""多向量检索:父文档检索策略"""
from langchain.retrievers.multi_vector import MultiVectorRetriever
from langchain.storage import InMemoryStore
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_core.documents import Document
import uuid
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma(collection_name="multi_vector", embedding_function=embeddings)
doc_store = InMemoryStore() # 存储原始大文档(InMemoryStore支持Document对象)
retriever = MultiVectorRetriever(
vectorstore=vectorstore,
byte_store=doc_store,
id_key="doc_id"
)
# 策略1: 父文档+子chunk
# 用小chunk检索,返回大的父文档
parent_docs = [
Document(page_content="这是一篇很长的文章..." * 100, metadata={"source": "doc1"})
]
child_splitter = RecursiveCharacterTextSplitter(chunk_size=200, chunk_overlap=20)
for parent_doc in parent_docs:
doc_id = str(uuid.uuid4())
child_docs = child_splitter.split_documents([parent_doc])
for child in child_docs:
child.metadata["doc_id"] = doc_id
retriever.vectorstore.add_documents(child_docs)
retriever.docstore.mset([(doc_id, parent_doc)]) # 存储父文档
# 策略2: 文档摘要检索
# 用摘要的embedding检索,返回原始文档
llm = ChatOpenAI(model="gpt-4o")
for doc in parent_docs:
doc_id = str(uuid.uuid4())
summary = llm.invoke(f"请用一句话总结:{doc.page_content[:1000]}").content
summary_doc = Document(page_content=summary, metadata={"doc_id": doc_id})
retriever.vectorstore.add_documents([summary_doc])
retriever.docstore.mset([(doc_id, doc)])
18.4.2 混合检索(Hybrid Retrieval)¶
"""混合检索:Dense + Sparse + Reranking"""
from langchain_community.retrievers import BM25Retriever
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.retrievers import EnsembleRetriever
from langchain_core.documents import Document
# 准备文档
docs = [
Document(page_content="LangChain是一个用于构建LLM应用的框架"),
Document(page_content="向量数据库用于存储和检索嵌入向量"),
Document(page_content="RAG是检索增强生成的缩写"),
Document(page_content="Transformer架构是现代大模型的基础"),
]
# Sparse检索器(BM25关键词匹配)
bm25_retriever = BM25Retriever.from_documents(docs, k=3)
# Dense检索器(向量语义检索)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(docs, embeddings)
vector_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# 混合检索(加权融合)
hybrid_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.4, 0.6] # BM25权重40%, 向量权重60%
)
results = hybrid_retriever.invoke("什么是RAG?")
# Reranking(使用BGE-reranker或Cohere Reranker)
# pip install langchain-cohere
# from langchain_cohere import CohereRerank
# from langchain.retrievers import ContextualCompressionRetriever
# reranker = CohereRerank(model="rerank-v3.5", top_n=3)
# reranking_retriever = ContextualCompressionRetriever(
# base_compressor=reranker,
# base_retriever=hybrid_retriever
# )
18.4.3 查询转换¶
"""查询转换策略"""
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from pydantic import BaseModel, Field
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# --- HyDE (Hypothetical Document Embedding) ---
def hyde_retrieval(query: str, retriever):
"""
HyDE: 先让LLM生成假设性答案文档,
用该文档的embedding去检索(比原始query效果更好)
"""
hyde_prompt = ChatPromptTemplate.from_template(
"请对以下问题写一段可能的答案(不需要保证准确):\n{query}"
)
hypothetical_doc = (hyde_prompt | llm).invoke({"query": query}).content
# 用假设文档的embedding进行检索
results = retriever.invoke(hypothetical_doc)
return results
# --- Multi-Query(多查询扩展)---
class MultiQuery(BaseModel):
queries: list[str] = Field(description="从不同角度重写的查询列表")
def multi_query_retrieval(query: str, retriever):
"""
Multi-Query: 将原始查询改写为多个不同角度的查询,
合并检索结果,增加召回率
"""
expand_prompt = ChatPromptTemplate.from_template(
"""请将以下问题从3个不同角度进行改写,以提高检索召回率:
原始问题:{query}
要求:保持语义一致,但使用不同的表述方式。"""
)
multi = (expand_prompt | llm.with_structured_output(MultiQuery)).invoke({"query": query})
# 对每个查询检索并去重
all_docs = []
seen_contents = set()
for q in [query] + multi.queries:
docs = retriever.invoke(q)
for doc in docs:
if doc.page_content not in seen_contents:
all_docs.append(doc)
seen_contents.add(doc.page_content)
return all_docs
# --- Step-Back Prompting ---
def step_back_retrieval(query: str, retriever):
"""
Step-Back: 先退一步问更高层次的问题,
获取背景知识后再回答原始问题
"""
step_back_prompt = ChatPromptTemplate.from_template(
"""请生成一个更高层次、更抽象的问题,帮助回答原始问题:
原始问题:{query}
更高层次的问题:"""
)
abstract_query = (step_back_prompt | llm).invoke({"query": query}).content
# 同时检索原始问题和抽象问题
original_docs = retriever.invoke(query)
abstract_docs = retriever.invoke(abstract_query)
return original_docs + abstract_docs
18.4.4 上下文压缩¶
"""上下文压缩:减少噪声,提取关键信息"""
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# LLM提取压缩器:只提取与查询相关的部分
compressor = LLMChainExtractor.from_llm(llm)
# base_retriever 是任意检索器
# compressed_retriever = ContextualCompressionRetriever(
# base_compressor=compressor,
# base_retriever=base_retriever
# )
# LongContext Reorder(长上下文重排序)
# 研究表明LLM更关注开头和结尾的内容
from langchain_community.document_transformers import LongContextReorder
reordering = LongContextReorder()
# reordered_docs = reordering.transform_documents(docs)
# 将最相关的文档放在开头和结尾
18.5 RAG评估框架¶
18.5.1 RAGAS评估指标¶
"""RAGAS评估框架"""
# pip install ragas
from ragas import evaluate
from ragas.metrics import (
faithfulness, # 忠实度:回答是否基于检索内容
answer_relevancy, # 答案相关性:回答是否切题
context_precision, # 上下文精度:检索内容是否精准
context_recall, # 上下文召回:是否检索到所有需要的信息
answer_correctness # 答案正确性:与标准答案的吻合度
)
from datasets import Dataset
# 准备评估数据
eval_data = {
"question": [
"LangChain是什么?",
"RAG的核心流程是什么?"
],
"answer": [
"LangChain是一个用于构建LLM应用的开源框架",
"RAG的流程包括检索相关文档和基于文档生成回答"
],
"contexts": [
["LangChain是一个流行的开源框架,专为构建大语言模型应用而设计"],
["RAG(检索增强生成)的核心流程:1.检索 2.增强 3.生成"]
],
"ground_truth": [
"LangChain是用于开发LLM驱动应用的开源框架",
"RAG包括文档检索、上下文增强和LLM生成三个步骤"
]
}
dataset = Dataset.from_dict(eval_data)
# 执行评估
results = evaluate(
dataset=dataset,
metrics=[
faithfulness,
answer_relevancy,
context_precision,
context_recall,
answer_correctness
]
)
print(results)
# 输出各指标评分 (0-1)
RAGAS核心指标解读:
| 指标 | 含义 | 评估什么 | 理想值 |
|---|---|---|---|
| Faithfulness | 忠实度 | 回答是否有检索依据(无hallucination) | >0.9 |
| Answer Relevancy | 答案相关性 | 回答是否切题 | >0.85 |
| Context Precision | 上下文精度 | 检索内容中有用信息占比 | >0.8 |
| Context Recall | 上下文召回 | 需要的信息是否都检索到了 | >0.8 |
| Answer Correctness | 答案正确性 | 与标准答案的吻合度 | >0.85 |
18.5.2 自建评估Pipeline¶
"""自建RAG评估Pipeline"""
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from typing import Literal
import json
class RAGEvalResult(BaseModel):
"""RAG评估结果"""
faithfulness_score: float = Field(description="忠实度(0-1)")
relevance_score: float = Field(description="相关性(0-1)")
completeness_score: float = Field(description="完整性(0-1)")
overall_score: float = Field(description="综合评分(0-1)")
explanation: str = Field(description="评价说明")
llm = ChatOpenAI(model="gpt-4o", temperature=0)
def evaluate_rag_response(
question: str,
context: str,
answer: str,
ground_truth: str = ""
) -> RAGEvalResult:
"""评估RAG系统的单次回答"""
eval_prompt = f"""请评估以下RAG系统的回答质量:
## 问题
{question}
## 检索到的上下文
{context}
## 系统回答
{answer}
## 参考答案(如有)
{ground_truth or "无"}
评分标准:
- faithfulness_score: 回答是否完全基于上下文,无编造(0-1)
- relevance_score: 回答是否与问题相关(0-1)
- completeness_score: 回答是否完整覆盖了问题要求(0-1)
- overall_score: 综合评分(0-1)"""
result = llm.with_structured_output(RAGEvalResult).invoke(eval_prompt)
return result
# 批量评估
def batch_evaluate(test_cases: list[dict]) -> dict:
"""批量评估RAG系统"""
results = []
for case in test_cases:
result = evaluate_rag_response(
question=case["question"],
context=case["context"],
answer=case["answer"],
ground_truth=case.get("ground_truth", "")
)
results.append(result)
# 汇总统计
avg_scores = {
"avg_faithfulness": sum(r.faithfulness_score for r in results) / len(results),
"avg_relevance": sum(r.relevance_score for r in results) / len(results),
"avg_completeness": sum(r.completeness_score for r in results) / len(results),
"avg_overall": sum(r.overall_score for r in results) / len(results),
}
return avg_scores
18.6 生产环境RAG优化¶
18.6.1 Chunk策略¶
"""高级分块策略"""
from langchain_text_splitters import (
RecursiveCharacterTextSplitter,
MarkdownHeaderTextSplitter,
)
from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai import OpenAIEmbeddings
# 策略1: 语义分割(根据语义相似度动态分块)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
semantic_splitter = SemanticChunker(
embeddings,
breakpoint_threshold_type="percentile", # percentile/standard_deviation/interquartile
breakpoint_threshold_amount=95
)
# semantic_chunks = semantic_splitter.split_documents(docs)
# 策略2: Markdown标题分割(保留结构信息)
md_splitter = MarkdownHeaderTextSplitter(
headers_to_split_on=[
("#", "Header 1"),
("##", "Header 2"),
("###", "Header 3"),
]
)
# md_chunks = md_splitter.split_text(markdown_text)
# 策略3: 递归分割 + 元数据增强
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=50,
separators=["\n\n", "\n", "。", "!", "?", ";", " "]
)
18.6.2 元数据过滤与缓存策略¶
"""元数据过滤与语义缓存"""
# 元数据过滤检索
# vectorstore.as_retriever(
# search_kwargs={
# "k": 5,
# "filter": {
# "source": "technical_doc",
# "date": {"$gte": "2024-01-01"},
# "department": "engineering"
# }
# }
# )
# 语义缓存(GPTCache / 自建)
from langchain_openai import OpenAIEmbeddings
import numpy as np
from functools import lru_cache
import hashlib
class SemanticCache:
"""简易语义缓存"""
def __init__(self, embeddings, threshold: float = 0.95):
self.embeddings = embeddings
self.threshold = threshold
self.cache: dict[str, dict] = {} # key: hash, value: {embedding, response}
def _get_embedding(self, text: str) -> list[float]:
return self.embeddings.embed_query(text)
def _cosine_similarity(self, a: list[float], b: list[float]) -> float:
a, b = np.array(a), np.array(b)
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
def get(self, query: str) -> str | None:
"""查找语义相似的缓存"""
query_emb = self._get_embedding(query)
for key, item in self.cache.items():
sim = self._cosine_similarity(query_emb, item["embedding"])
if sim >= self.threshold:
return item["response"]
return None
def set(self, query: str, response: str):
"""添加缓存"""
key = hashlib.md5(query.encode()).hexdigest()
self.cache[key] = {
"embedding": self._get_embedding(query),
"response": response
}
def rag_with_cache(self, query: str, rag_chain) -> str:
"""带缓存的RAG调用"""
cached = self.get(query)
if cached:
print("🎯 缓存命中")
return cached
response = rag_chain.invoke(query)
self.set(query, response)
return response
18.7 完整代码示例:高级RAG管线¶
"""完整示例:使用LangChain构建高级RAG管线"""
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain_community.retrievers import BM25Retriever
from langchain.retrievers import EnsembleRetriever, ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from pydantic import BaseModel, Field
from typing import Literal
# ============ 配置 ============
llm = ChatOpenAI(model="gpt-4o", temperature=0)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# ============ 文档准备 ============
sample_docs = [
Document(page_content="LangChain是一个开源框架,用于构建基于大语言模型的应用。它提供了链、Agent、RAG等核心组件。", metadata={"source": "langchain_intro", "category": "framework"}),
Document(page_content="LlamaIndex专注于RAG场景,提供了Documents→Nodes→Index→QueryEngine的标准管线。", metadata={"source": "llamaindex_intro", "category": "framework"}),
Document(page_content="向量数据库如Chroma、Pinecone、Weaviate用于存储和检索文档的嵌入向量。", metadata={"source": "vector_db", "category": "infrastructure"}),
Document(page_content="RAG(检索增强生成)通过检索外部知识来增强LLM的回答质量,减少幻觉。", metadata={"source": "rag_intro", "category": "technique"}),
Document(page_content="RAGAS是一个RAG评估框架,提供faithfulness、relevancy、precision等指标。", metadata={"source": "ragas_intro", "category": "evaluation"}),
]
# ============ 构建检索器 ============
# Dense检索
vectorstore = Chroma.from_documents(sample_docs, embeddings)
dense_retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# Sparse检索(BM25)
bm25_retriever = BM25Retriever.from_documents(sample_docs, k=3)
# 混合检索
hybrid_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, dense_retriever],
weights=[0.4, 0.6]
)
# 上下文压缩
compressor = LLMChainExtractor.from_llm(ChatOpenAI(model="gpt-4o-mini"))
compressed_retriever = ContextualCompressionRetriever(
base_compressor=compressor,
base_retriever=hybrid_retriever
)
# ============ 查询路由 ============
class RouteDecision(BaseModel):
route: Literal["simple", "advanced"] = Field(description="查询路由")
router_prompt = ChatPromptTemplate.from_template(
"判断问题复杂度:简单事实问题选simple,需要综合分析选advanced。\n问题:{question}"
)
router = router_prompt | llm.with_structured_output(RouteDecision)
# ============ 生成链 ============
rag_prompt = ChatPromptTemplate.from_template("""
基于以下检索内容回答问题。如果内容不足以回答,请说明。
检索内容:
{context}
问题:{question}
要求:
1. 回答要准确,不要编造
2. 引用信息来源
3. 如果不确定,坦诚说明
""")
def format_docs(docs):
return "\n\n".join(f"[{doc.metadata.get('source', 'unknown')}] {doc.page_content}" for doc in docs)
def route_and_retrieve(question: str):
"""路由 + 检索"""
decision = router.invoke({"question": question})
if decision.route == "simple":
docs = hybrid_retriever.invoke(question)
else:
docs = compressed_retriever.invoke(question)
return format_docs(docs)
# 完整RAG链
advanced_rag_chain = (
{
"context": RunnableLambda(lambda x: route_and_retrieve(x["question"])), # lambda匿名函数
"question": RunnablePassthrough() | RunnableLambda(lambda x: x["question"])
}
| rag_prompt
| llm
| StrOutputParser()
)
# ============ 执行 ============
answer = advanced_rag_chain.invoke({"question": "什么是RAG?它有什么优势?"})
print(answer)
📋 面试要点¶
高频面试题¶
Q1: 解释Naive RAG、Advanced RAG、Modular RAG和Agentic RAG的区别?
答:Naive RAG是简单的检索+生成管线;Advanced RAG加入预检索优化(查询重写)和后检索优化(重排序);Modular RAG将各环节模块化,可灵活组合;Agentic RAG由Agent动态决定检索策略,支持多步推理和自我反思。演进方向是从静态管线到智能自适应。
Q2: GraphRAG的Global Search和Local Search有什么区别?
答:Global Search使用社区摘要进行Map-Reduce,适合全局性问题(如"数据集的主要主题")。Local Search结合向量检索和图谱邻居遍历,适合具体问题(如"某实体的属性")。Global Search成本更高但能回答跨文档问题。
Q3: 什么是Self-RAG?与传统RAG有何不同?
答:Self-RAG引入"反思标记",让模型在检索前判断是否需要检索,检索后评估相关性,生成后检查依据性。与传统RAG的固定管线不同,Self-RAG是自适应的,可减少不必要检索和hallucination。
Q4: 如何评估RAG系统?RAGAS框架有哪些核心指标?
答:RAGAS核心指标:①Faithfulness(忠实度,回答是否有依据)②Answer Relevancy(答案相关性)③Context Precision(检索精度)④Context Recall(检索召回率)。评估既需自动指标也需人工评估,生产环境建议持续监控这些指标。
Q5: 混合检索为什么优于纯向量检索?
答:向量检索擅长语义匹配但可能错过关键词精确匹配;BM25擅长关键词匹配但不理解语义。混合检索(如BM25 0.4 + Dense 0.6)结合两者优势,再通过Reranker二次排序,显著提升检索质量。实践中Top-5相关性可提升10-20%。
✏️ 练习¶
练习1:GraphRAG实践¶
使用LangChain + LLM构建一个简单的知识图谱,实现从一段文本中提取实体和关系,并进行基于图谱的问答。
练习2:Self-RAG实现¶
实现Self-RAG的完整流程,包括:检索必要性判断→文档检索→相关性评估→生成→依据性检查。对比Self-RAG与Naive RAG在事实性问题上的表现。
练习3:混合检索系统¶
使用BM25 + 向量检索 + Cohere Reranker构建混合检索系统,对比不同权重配置下的检索效果。
练习4:RAG评估¶
使用RAGAS框架对你构建的RAG系统进行评估,分析Faithfulness和Relevancy指标,找出系统瓶颈并提出优化方案。
📚 参考资料 - GraphRAG论文: "From Local to Global: A Graph RAG Approach to Query-Focused Summarization" (2024) - Self-RAG论文: "Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection" (2024) - CRAG论文: "Corrective Retrieval Augmented Generation" (2024) - RAGAS文档 - LangChain RAG教程
最后更新日期:2026-02-12 适用版本:LLM应用指南 v2026