LlamaIndex框架¶
⚠️ 时效性说明:本章涉及前沿模型/价格/榜单等信息,可能随版本快速变化;请以论文原文、官方发布页和 API 文档为准。
📌 LlamaIndex是专为RAG(检索增强生成)设计的数据框架,通过Documents→Nodes→Index→QueryEngine的管道,将私有数据与大模型无缝连接。
🎯 学习目标¶
- 理解LlamaIndex的核心架构与设计哲学
- 掌握数据加载、索引构建、查询引擎的完整流程
- 熟练使用多种索引类型与检索策略
- 掌握高级RAG技术:句子窗口检索、自动合并检索、递归检索
- 了解重排序与响应合成器的使用
- 学会LlamaIndex评估框架的使用
- 能够构建生产级企业知识库问答系统
- 掌握LlamaIndex面试高频考点
16.1 LlamaIndex核心概念与架构¶
16.1.1 什么是LlamaIndex¶
LlamaIndex(原名GPT Index)是一个专为大语言模型应用设计的数据框架,核心目标是让LLM与私有数据高效交互。
核心价值:
- 数据连接:支持多种数据源的连接和加载
- 数据索引:提供多种索引结构进行数据组织
- 查询接口:提供自然语言查询引擎
- RAG优化:内置多种高级RAG策略
- 评估框架:提供完善的质量评估工具
与直接调用LLM API的区别:
| 特性 | 直接API调用 | LlamaIndex |
|---|---|---|
| 数据加载 | 手动处理 | 自动化加载器 |
| 分块策略 | 手动实现 | 内置多种策略 |
| 索引管理 | 无 | 多种索引类型 |
| 检索优化 | 手动实现 | 内置高级策略 |
| 评估工具 | 无 | 内置评估框架 |
16.1.2 核心架构:Documents → Nodes → Index → Query Engine¶
LlamaIndex的数据处理管道遵循清晰的四阶段架构:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐
│Documents │───>│ Nodes │───>│ Index │───>│ Query Engine │
│(原始文档) │ │(文档节点) │ │(索引结构) │ │ (查询引擎) │
└──────────┘ └──────────┘ └──────────┘ └──────────────┘
▲ ▲ ▲ ▲
│ │ │ │
数据加载器 节点解析器 索引构建器 查询处理器
(Readers) (NodeParser) (IndexBuilder) (QueryEngine)
1. Documents(文档)
Document是LlamaIndex中最基础的数据抽象,代表一个完整的数据源:
from llama_index.core import Document
# 手动创建Document
doc = Document(
text="LlamaIndex是一个数据框架...",
metadata={
"source": "tutorial.md",
"author": "AI Team",
"created_at": "2025-01-01"
}
)
print(f"文档ID: {doc.doc_id}")
print(f"文档内容: {doc.text[:100]}")
print(f"元数据: {doc.metadata}")
2. Nodes(节点)
Node是Document的更细粒度表示,通常是Document经过分块后的片段:
from llama_index.core.node_parser import SentenceSplitter
# 创建节点解析器
parser = SentenceSplitter(
chunk_size=1024,
chunk_overlap=200
)
# 将Document转换为Nodes
nodes = parser.get_nodes_from_documents([doc])
for i, node in enumerate(nodes): # enumerate同时获取索引和元素
print(f"Node {i}: {node.text[:80]}...")
print(f" - Node ID: {node.node_id}")
print(f" - 元数据: {node.metadata}")
print(f" - 关系: {node.relationships}")
3. Index(索引)
Index将Nodes组织成可高效检索的数据结构:
from llama_index.core import VectorStoreIndex
# 从Nodes创建索引
index = VectorStoreIndex(nodes)
# 也可以直接从Documents创建(内部自动分块)
index = VectorStoreIndex.from_documents(documents)
4. Query Engine(查询引擎)
QueryEngine接收用户查询并返回响应:
# 创建查询引擎
query_engine = index.as_query_engine()
# 执行查询
response = query_engine.query("LlamaIndex是什么?")
print(response)
16.1.3 LlamaIndex设计哲学¶
核心设计原则:
- 数据优先:以数据连接和索引为核心
- 模块化:每个组件可独立替换
- 可组合:组件可灵活组合
- 渐进式复杂度:从简单到高级逐步深入
- 评估驱动:内置评估机制验证质量
16.2 安装与快速开始¶
16.2.1 安装LlamaIndex(v0.11+)¶
# 核心包安装
pip install llama-index
# 这会安装以下核心包:
# llama-index-core: 核心库
# llama-index-llms-openai: OpenAI LLM集成
# llama-index-embeddings-openai: OpenAI Embeddings集成
# llama-index-readers-file: 文件读取器
# 单独安装特定集成
pip install llama-index-llms-anthropic # Anthropic Claude
pip install llama-index-llms-huggingface # HuggingFace模型
pip install llama-index-embeddings-huggingface # HuggingFace Embeddings
pip install llama-index-vector-stores-chroma # Chroma向量数据库
pip install llama-index-vector-stores-pinecone # Pinecone
pip install llama-index-vector-stores-weaviate # Weaviate
pip install llama-index-vector-stores-milvus # Milvus
v0.11+ 包结构说明:
llama-index (元包)
├── llama-index-core # 核心抽象和接口
├── llama-index-llms-* # LLM集成包
├── llama-index-embeddings-* # Embedding集成包
├── llama-index-readers-* # 数据读取器包
├── llama-index-vector-stores-* # 向量存储集成包
└── llama-index-callbacks-* # 回调集成包
16.2.2 环境配置¶
import os
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# 设置API密钥
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"
# 全局设置(推荐方式)
Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.chunk_size = 1024
Settings.chunk_overlap = 200
print("LlamaIndex配置完成!")
print(f"LLM: {Settings.llm.model}")
print(f"Embed Model: {Settings.embed_model.model_name}")
16.2.3 五分钟快速入门¶
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Step 1: 加载数据
documents = SimpleDirectoryReader("./data").load_data()
print(f"加载了 {len(documents)} 个文档")
# Step 2: 构建索引
index = VectorStoreIndex.from_documents(documents)
# Step 3: 创建查询引擎
query_engine = index.as_query_engine()
# Step 4: 进行查询
response = query_engine.query("请总结这些文档的主要内容")
print(response)
# 查看引用的源文档
for node in response.source_nodes:
print(f"\n来源: {node.metadata.get('file_name', 'Unknown')}")
print(f"相关度: {node.score:.4f}")
print(f"内容: {node.text[:200]}...")
16.2.4 使用本地模型¶
from llama_index.core import Settings
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# 使用本地HuggingFace模型
Settings.llm = HuggingFaceLLM(
model_name="Qwen/Qwen2.5-7B-Instruct",
tokenizer_name="Qwen/Qwen2.5-7B-Instruct",
context_window=4096,
max_new_tokens=512,
device_map="auto",
model_kwargs={"torch_dtype": "float16"}
)
# 使用本地Embedding模型
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-zh-v1.5"
)
print("本地模型配置完成!")
16.2.5 使用Ollama本地模型¶
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core import Settings
# 使用Ollama运行的本地模型
Settings.llm = Ollama(
model="qwen2.5:7b",
base_url="http://localhost:11434",
request_timeout=120.0
)
Settings.embed_model = OllamaEmbedding(
model_name="nomic-embed-text",
base_url="http://localhost:11434"
)
16.3 数据加载器(Data Loaders)¶
16.3.1 SimpleDirectoryReader¶
最常用的数据加载器,支持自动识别文件类型:
from llama_index.core import SimpleDirectoryReader
# 基础用法 - 加载目录下所有支持格式的文件
documents = SimpleDirectoryReader("./data").load_data()
# 指定文件类型
documents = SimpleDirectoryReader(
input_dir="./data",
required_exts=[".pdf", ".txt", ".md", ".docx"],
recursive=True, # 递归加载子目录
filename_as_id=True # 使用文件名作为文档ID
).load_data()
# 加载单个文件
documents = SimpleDirectoryReader(
input_files=["./data/report.pdf", "./data/readme.md"]
).load_data()
# 排除特定文件
documents = SimpleDirectoryReader(
input_dir="./data",
exclude=["*.tmp", "*.log"],
recursive=True
).load_data()
for doc in documents:
print(f"文件: {doc.metadata['file_name']}")
print(f"大小: {len(doc.text)} 字符")
print(f"类型: {doc.metadata.get('file_type', 'Unknown')}")
print("---")
16.3.2 PDF加载器¶
from llama_index.readers.file import PDFReader
# 基础PDF加载
pdf_reader = PDFReader()
documents = pdf_reader.load_data(file_path="./data/technical_doc.pdf")
# 使用SimpleDirectoryReader加载PDF(自动识别)
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader(
input_files=["./data/paper.pdf"]
).load_data()
# 使用pymupdf加速PDF解析
# pip install llama-index-readers-file pymupdf
from llama_index.readers.file import PyMuPDFReader
reader = PyMuPDFReader()
documents = reader.load_data(file_path="./data/paper.pdf")
for doc in documents:
print(f"页码: {doc.metadata.get('page_label', 'N/A')}")
print(f"内容预览: {doc.text[:200]}")
print("---")
16.3.3 HTML加载器¶
from llama_index.readers.web import SimpleWebPageReader, BeautifulSoupWebReader
# 简单网页加载
web_reader = SimpleWebPageReader()
documents = web_reader.load_data(
urls=["https://docs.llamaindex.ai/en/stable/"]
)
# 使用BeautifulSoup解析
bs_reader = BeautifulSoupWebReader()
documents = bs_reader.load_data(
urls=["https://example.com/article1", "https://example.com/article2"]
)
for doc in documents:
print(f"URL: {doc.metadata.get('url', 'N/A')}")
print(f"内容长度: {len(doc.text)}")
16.3.4 JSON加载器¶
from llama_index.readers.json import JSONReader
# 加载JSON数据
json_reader = JSONReader(
levels_back=0,
collapse_length=None,
ensure_ascii=False
)
documents = json_reader.load_data(input_file="./data/api_data.json")
# 从JSON行文件加载
import json
from llama_index.core import Document
def load_jsonl(file_path: str) -> list:
"""加载JSONL文件"""
documents = []
with open(file_path, 'r', encoding='utf-8') as f: # with自动管理文件关闭
for line in f:
data = json.loads(line.strip()) # json.loads将JSON字符串→Python对象
doc = Document(
text=data.get("content", ""),
metadata={
"title": data.get("title", ""),
"source": data.get("source", ""),
"date": data.get("date", "")
}
)
documents.append(doc)
return documents
documents = load_jsonl("./data/articles.jsonl")
16.3.5 数据库加载器¶
from llama_index.readers.database import DatabaseReader
# MySQL加载器
db_reader = DatabaseReader(
uri="mysql+pymysql://user:password@localhost:3306/mydb"
)
# 使用SQL查询加载数据
documents = db_reader.load_data(
query="SELECT id, title, content FROM articles WHERE status='published'"
)
# PostgreSQL加载器
pg_reader = DatabaseReader(
uri="postgresql://user:password@localhost:5432/mydb"
)
documents = pg_reader.load_data(
query="SELECT * FROM knowledge_base"
)
# 自定义数据库加载函数
from sqlalchemy import create_engine, text
from llama_index.core import Document
def load_from_database(connection_string: str, query: str) -> list:
"""从数据库加载数据为Document列表"""
engine = create_engine(connection_string)
documents = []
with engine.connect() as conn:
result = conn.execute(text(query))
columns = result.keys()
for row in result:
row_dict = dict(zip(columns, row))
content = row_dict.pop("content", str(row_dict))
doc = Document(
text=content,
metadata=row_dict
)
documents.append(doc)
return documents
16.3.6 其他常用加载器¶
# Notion加载器
# pip install llama-index-readers-notion
from llama_index.readers.notion import NotionPageReader
notion_reader = NotionPageReader(integration_token="your-notion-token")
documents = notion_reader.load_data(page_ids=["page_id_1", "page_id_2"])
# Slack加载器
# pip install llama-index-readers-slack
from llama_index.readers.slack import SlackReader
slack_reader = SlackReader(slack_token="your-slack-token")
documents = slack_reader.load_data(channel_ids=["C01234ABCDE"])
# GitHub加载器
# pip install llama-index-readers-github
from llama_index.readers.github import GithubRepositoryReader
github_reader = GithubRepositoryReader(
github_token="your-github-token",
owner="owner",
repo="repo",
filter_file_extensions=[".py", ".md", ".txt"]
)
documents = github_reader.load_data(branch="main")
# CSV加载器
# pip install llama-index-readers-file
from llama_index.readers.file import PandasCSVReader
csv_reader = PandasCSVReader()
documents = csv_reader.load_data(file_path="./data/dataset.csv")
16.3.7 自定义数据加载器¶
from llama_index.core.readers.base import BaseReader
from llama_index.core import Document
class CustomAPIReader(BaseReader):
"""自定义API数据加载器"""
def __init__(self, api_base_url: str, api_key: str):
self.api_base_url = api_base_url
self.api_key = api_key
def load_data(
self,
endpoint: str = "/articles",
params: dict | None = None
) -> list[Document]:
"""从API加载数据"""
import requests
headers = {"Authorization": f"Bearer {self.api_key}"}
response = requests.get(
f"{self.api_base_url}{endpoint}",
headers=headers,
params=params or {}
)
response.raise_for_status()
documents = []
for item in response.json().get("data", []):
doc = Document(
text=item.get("content", ""),
metadata={
"title": item.get("title", ""),
"id": item.get("id", ""),
"created_at": item.get("created_at", ""),
"source": f"{self.api_base_url}{endpoint}"
}
)
documents.append(doc)
return documents
# 使用自定义加载器
reader = CustomAPIReader(
api_base_url="https://api.example.com",
api_key="your-api-key"
)
documents = reader.load_data(endpoint="/articles", params={"limit": 100})
16.4 节点解析与文本分块¶
16.4.1 SentenceSplitter(句子分块器)¶
from llama_index.core.node_parser import SentenceSplitter
# 基础句子分块
splitter = SentenceSplitter(
chunk_size=1024, # 每块最大token数
chunk_overlap=200, # 块间重叠token数
separator=" ", # 分隔符
paragraph_separator="\n\n\n" # 段落分隔符
)
nodes = splitter.get_nodes_from_documents(documents)
print(f"生成了 {len(nodes)} 个节点")
for i, node in enumerate(nodes[:3]):
print(f"\nNode {i}:")
print(f" 文本长度: {len(node.text)}")
print(f" 预览: {node.text[:100]}...")
16.4.2 SentenceWindowNodeParser(句子窗口解析器)¶
from llama_index.core.node_parser import SentenceWindowNodeParser
# 句子窗口解析器 - 用于高级RAG
window_parser = SentenceWindowNodeParser.from_defaults(
window_size=3, # 窗口大小(前后各3句)
window_metadata_key="window",
original_text_metadata_key="original_text"
)
nodes = window_parser.get_nodes_from_documents(documents)
# 查看节点和窗口
for node in nodes[:2]:
print(f"原始句子: {node.text[:80]}...")
print(f"窗口文本: {node.metadata.get('window', '')[:200]}...")
print("---")
16.4.3 HierarchicalNodeParser(层级节点解析器)¶
from llama_index.core.node_parser import HierarchicalNodeParser, get_leaf_nodes
# 层级解析器 - 用于自动合并检索
hierarchical_parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128] # 三层层级:大块→中块→小块
)
nodes = hierarchical_parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)
print(f"总节点数: {len(nodes)}")
print(f"叶子节点数: {len(leaf_nodes)}")
# 查看层级关系
for node in nodes[:5]:
parent = node.relationships.get("parent", None)
children = node.relationships.get("children", [])
print(f"Node: {node.node_id[:8]}...")
print(f" 层级: 父={parent is not None}, 子={len(children)}")
print(f" 文本长度: {len(node.text)}")
16.4.4 MarkdownNodeParser¶
from llama_index.core.node_parser import MarkdownNodeParser
# Markdown感知解析器
md_parser = MarkdownNodeParser()
nodes = md_parser.get_nodes_from_documents(documents)
for node in nodes[:5]:
print(f"标题: {node.metadata.get('header_path', 'N/A')}")
print(f"内容: {node.text[:100]}...")
print("---")
16.5 索引类型详解¶
16.5.1 VectorStoreIndex(向量索引)¶
最常用的索引类型,基于向量相似度实现高效语义检索:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# 加载文档并构建向量索引
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# 查询(默认返回top-2最相似的节点)
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("什么是深度学习?")
print(response)
# 查看检索到的源节点
for node in response.source_nodes:
print(f"相关度分数: {node.score:.4f}")
print(f"内容: {node.text[:200]}")
print("---")
高级配置:
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.node_parser import SentenceSplitter
# 自定义分块
parser = SentenceSplitter(chunk_size=512, chunk_overlap=100)
nodes = parser.get_nodes_from_documents(documents)
# 从节点创建索引(更多控制)
index = VectorStoreIndex(
nodes=nodes,
show_progress=True # 显示进度
)
# 增量添加文档
new_docs = SimpleDirectoryReader("./new_data").load_data()
for doc in new_docs:
index.insert(doc)
# 删除文档
index.delete_ref_doc("doc_id_to_delete")
16.5.2 SummaryIndex(摘要索引)¶
将所有节点存储为一个列表,查询时遍历所有节点进行摘要:
from llama_index.core import SummaryIndex
# 创建摘要索引
summary_index = SummaryIndex.from_documents(documents)
# 查询(默认使用所有节点)
query_engine = summary_index.as_query_engine(
response_mode="tree_summarize"
)
response = query_engine.query("请对所有文档进行总结")
print(response)
# 适用场景:
# - 需要全文摘要时
# - 文档量较小时
# - 不需要精确定位具体段落时
16.5.3 TreeIndex(树形索引)¶
构建自底向上的摘要树,从叶子节点(原始文本)到根节点(高级摘要):
from llama_index.core import TreeIndex
# 创建树形索引
tree_index = TreeIndex.from_documents(
documents,
num_children=10, # 每个节点的子节点数
show_progress=True
)
# 查询
query_engine = tree_index.as_query_engine(
child_branch_factor=2 # 每次遍历的分支数
)
response = query_engine.query("这些文档的主要主题是什么?")
print(response)
# 适用场景:
# - 层级化的文档结构
# - 需要从概览到细节的多层次查询
# - 大规模文档集合的摘要
16.5.4 KeywordTableIndex(关键词表索引)¶
基于关键词的索引,使用关键词匹配进行检索:
from llama_index.core import KeywordTableIndex
# 创建关键词索引
keyword_index = KeywordTableIndex.from_documents(documents)
# 查询
query_engine = keyword_index.as_query_engine()
response = query_engine.query("Transformer架构")
print(response)
# 查看提取的关键词
print("\n关键词表:")
for keyword, node_ids in keyword_index.index_struct.table.items():
print(f" {keyword}: {len(node_ids)} 个节点")
# 适用场景:
# - 基于精确关键词的检索
# - 技术文档搜索
# - 与向量检索互补使用
16.5.5 KnowledgeGraphIndex(知识图谱索引)¶
将文档转换为知识图谱三元组(主体-关系-客体)进行索引:
from llama_index.core import KnowledgeGraphIndex
from llama_index.core import StorageContext
from llama_index.graph_stores.nebula import NebulaGraphStore
# 基础知识图谱索引(内存存储)
kg_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=10,
include_embeddings=True,
show_progress=True
)
# 查询
query_engine = kg_index.as_query_engine(
include_text=True,
response_mode="tree_summarize"
)
response = query_engine.query("LlamaIndex和LangChain有什么关系?")
print(response)
# 使用NebulaGraph存储
# graph_store = NebulaGraphStore(
# space_name="knowledge_graph",
# edge_types=["关系"],
# rel_prop_names=["name"],
# tags=["entity"]
# )
# storage_context = StorageContext.from_defaults(graph_store=graph_store)
# kg_index = KnowledgeGraphIndex.from_documents(
# documents,
# storage_context=storage_context,
# )
# 适用场景:
# - 实体关系问答
# - 多跳推理
# - 结构化知识检索
16.5.6 索引类型对比¶
| 索引类型 | 检索方式 | 适用场景 | 时间复杂度 | 优点 | 缺点 |
|---|---|---|---|---|---|
| VectorStoreIndex | 语义相似度 | 通用问答 | O(log n) | 语义理解强 | 需要嵌入计算 |
| SummaryIndex | 全量遍历 | 文档摘要 | O(n) | 完整覆盖 | 速度慢 |
| TreeIndex | 树形遍历 | 层级查询 | O(log n) | 层级结构 | 构建耗时 |
| KeywordTableIndex | 关键词匹配 | 精确搜索 | O(1) | 速度快 | 语义理解弱 |
| KnowledgeGraphIndex | 图遍历 | 关系推理 | O(k) | 关系推理 | 构建复杂 |
16.6 存储与持久化¶
16.6.1 本地存储¶
from llama_index.core import (
VectorStoreIndex,
SimpleDirectoryReader,
StorageContext,
load_index_from_storage
)
# 构建索引
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# 持久化到本地
index.storage_context.persist(persist_dir="./storage")
print("索引已保存到 ./storage")
# 从本地加载
storage_context = StorageContext.from_defaults(persist_dir="./storage")
loaded_index = load_index_from_storage(storage_context)
print("索引已从 ./storage 加载")
# 使用加载的索引进行查询
query_engine = loaded_index.as_query_engine()
response = query_engine.query("请回答问题")
print(response)
16.6.2 Chroma向量数据库集成¶
# pip install llama-index-vector-stores-chroma chromadb
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
# 创建Chroma客户端
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")
# 配置存储上下文
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# 构建索引(数据自动存储到Chroma)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
# 后续加载(无需重新构建索引)
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
index = VectorStoreIndex.from_vector_store(vector_store)
query_engine = index.as_query_engine()
response = query_engine.query("你的问题")
16.6.3 Pinecone向量数据库集成¶
# pip install llama-index-vector-stores-pinecone pinecone
from pinecone import Pinecone, ServerlessSpec
from llama_index.vector_stores.pinecone import PineconeVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
# 初始化Pinecone
pc = Pinecone(api_key="your-pinecone-api-key")
# 创建索引(如果不存在)
index_name = "llama-index-demo"
if index_name not in pc.list_indexes().names():
pc.create_index(
name=index_name,
dimension=1536, # OpenAI text-embedding-3-small的维度
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
pinecone_index = pc.Index(index_name)
# 配置向量存储
vector_store = PineconeVectorStore(pinecone_index=pinecone_index)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# 构建索引
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
# 查询
query_engine = index.as_query_engine()
response = query_engine.query("你的问题")
16.6.4 Weaviate向量数据库集成¶
# pip install llama-index-vector-stores-weaviate weaviate-client
import weaviate
from llama_index.vector_stores.weaviate import WeaviateVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
# 连接Weaviate
client = weaviate.connect_to_local() # 或 weaviate.connect_to_wcs(...)
# 配置向量存储
vector_store = WeaviateVectorStore(
weaviate_client=client,
index_name="LlamaIndexDemo"
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# 构建索引
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
# 查询
query_engine = index.as_query_engine()
response = query_engine.query("你的问题")
# 记得关闭连接
client.close()
16.6.5 Milvus向量数据库集成¶
# pip install llama-index-vector-stores-milvus pymilvus
from llama_index.vector_stores.milvus import MilvusVectorStore
from llama_index.core import VectorStoreIndex, StorageContext
# 连接Milvus
vector_store = MilvusVectorStore(
uri="http://localhost:19530", # Milvus服务地址
collection_name="llama_index_demo",
dim=1536, # 嵌入维度
overwrite=True # 覆盖已有集合
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# 构建索引
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
# 使用Milvus Lite(轻量级,无需服务器)
vector_store_lite = MilvusVectorStore(
uri="./milvus_demo.db", # 本地文件
collection_name="demo",
dim=1536
)
16.7 查询引擎¶
16.7.1 QueryEngine vs ChatEngine¶
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
# === QueryEngine: 一问一答模式 ===
query_engine = index.as_query_engine(
similarity_top_k=3,
response_mode="compact"
)
response = query_engine.query("什么是RAG?")
print("QueryEngine:", response)
# === ChatEngine: 多轮对话模式 ===
chat_engine = index.as_chat_engine(
chat_mode="condense_plus_context",
similarity_top_k=3,
verbose=True
)
# 多轮对话
response1 = chat_engine.chat("什么是RAG?")
print("回合1:", response1)
response2 = chat_engine.chat("它有什么优势?") # 自动携带上下文
print("回合2:", response2)
response3 = chat_engine.chat("如何实现?")
print("回合3:", response3)
# 重置对话
chat_engine.reset()
ChatEngine模式对比:
| 模式 | 说明 | 适用场景 |
|---|---|---|
best | 自动选择最佳模式 | 通用 |
condense_question | 将对话历史压缩为独立问题 | 标准对话 |
condense_plus_context | 压缩问题+检索增强 | RAG对话 |
context | 每次检索相关上下文 | 知识库问答 |
simple | 直接LLM对话(不检索) | 普通聊天 |
16.7.2 流式输出¶
# QueryEngine流式输出
query_engine = index.as_query_engine(streaming=True)
streaming_response = query_engine.query("详细解释Transformer架构")
# 逐token输出
for text in streaming_response.response_gen:
print(text, end="", flush=True)
print() # 换行
# ChatEngine流式输出
chat_engine = index.as_chat_engine(
chat_mode="condense_plus_context",
streaming=True
)
streaming_response = chat_engine.stream_chat("介绍RAG的工作原理")
for token in streaming_response.response_gen:
print(token, end="", flush=True)
print()
16.7.3 子问题查询引擎(SubQuestionQueryEngine)¶
将复杂问题分解为子问题,分别检索后合成答案:
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# 假设有多个索引(不同数据源)
tech_index = VectorStoreIndex.from_documents(tech_docs)
business_index = VectorStoreIndex.from_documents(business_docs)
# 创建查询引擎工具
query_engine_tools = [
QueryEngineTool(
query_engine=tech_index.as_query_engine(),
metadata=ToolMetadata(
name="tech_knowledge",
description="包含技术文档的知识库,用于回答技术相关问题"
)
),
QueryEngineTool(
query_engine=business_index.as_query_engine(),
metadata=ToolMetadata(
name="business_knowledge",
description="包含商业文档的知识库,用于回答商业相关问题"
)
),
]
# 创建子问题查询引擎
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=query_engine_tools,
verbose=True
)
# 复杂查询会被自动分解
response = sub_question_engine.query(
"从技术和商业角度分析AI在医疗领域的应用前景"
)
print(response)
16.7.4 Router查询引擎¶
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
# 创建路由查询引擎
router_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(),
query_engine_tools=query_engine_tools,
verbose=True
)
# LLM自动选择最合适的引擎
response = router_engine.query("最新的AI技术趋势是什么?")
print(response)
16.8 检索策略¶
16.8.1 向量检索¶
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
# 基础向量检索
retriever = index.as_retriever(
similarity_top_k=5
)
nodes = retriever.retrieve("什么是深度学习?")
for node in nodes:
print(f"分数: {node.score:.4f}")
print(f"内容: {node.text[:150]}...")
print("---")
16.8.2 关键词检索¶
from llama_index.core import KeywordTableIndex
keyword_index = KeywordTableIndex.from_documents(documents)
retriever = keyword_index.as_retriever(
retriever_mode="default" # 或 "simple"
)
nodes = retriever.retrieve("Transformer注意力机制")
for node in nodes:
print(f"内容: {node.text[:150]}...")
16.8.3 混合检索(Hybrid Retrieval)¶
from llama_index.core.retrievers import QueryFusionRetriever
# 创建多个检索器
vector_retriever = vector_index.as_retriever(similarity_top_k=5)
keyword_retriever = keyword_index.as_retriever()
# 查询融合检索器(混合检索 + 倒数排名融合)
fusion_retriever = QueryFusionRetriever(
retrievers=[vector_retriever, keyword_retriever],
similarity_top_k=5,
num_queries=4, # 生成查询变体数量
mode="reciprocal_rerank", # 倒数排名融合
use_async=True,
verbose=True
)
# 检索
nodes = fusion_retriever.retrieve("解释注意力机制的工作原理")
for node in nodes:
print(f"分数: {node.score:.4f}")
print(f"内容: {node.text[:150]}...")
16.8.4 自动检索(AutoRetriever)¶
根据查询自动生成元数据过滤条件:
from llama_index.core.retrievers import VectorIndexAutoRetriever
from llama_index.core.vector_stores.types import MetadataInfo, VectorStoreInfo
# 定义元数据信息
vector_store_info = VectorStoreInfo(
content_info="技术文档",
metadata_info=[
MetadataInfo(
name="category",
type="str",
description="文档类别,如:AI、Web、Cloud"
),
MetadataInfo(
name="date",
type="str",
description="文档发布日期,格式:YYYY-MM-DD"
),
MetadataInfo(
name="author",
type="str",
description="文档作者"
),
]
)
# 创建自动检索器
auto_retriever = VectorIndexAutoRetriever(
index=vector_index,
vector_store_info=vector_store_info,
similarity_top_k=5,
verbose=True
)
# 查询时自动添加过滤条件
nodes = auto_retriever.retrieve(
"2025年关于AI的最新文档"
)
# LLM会自动生成: category="AI", date>="2025-01-01" 的过滤条件
16.9 高级RAG技术¶
16.9.1 句子窗口检索(Sentence Window Retrieval)¶
核心思想:检索时使用精确的句子级嵌入匹配,但返回时扩展到包含上下文窗口的更大文本块。
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.indices.postprocessor import MetadataReplacementPostProcessor
from llama_index.core import VectorStoreIndex, Settings
# Step 1: 使用句子窗口解析器
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3, # 每侧包含3句
window_metadata_key="window",
original_text_metadata_key="original_text"
)
# Step 2: 构建索引(嵌入基于单句)
nodes = node_parser.get_nodes_from_documents(documents)
sentence_index = VectorStoreIndex(nodes)
# Step 3: 查询时用窗口文本替换原始句子
postprocessor = MetadataReplacementPostProcessor(
target_metadata_key="window"
)
# Step 4: 创建查询引擎
query_engine = sentence_index.as_query_engine(
similarity_top_k=3,
node_postprocessors=[postprocessor]
)
response = query_engine.query("什么是Attention机制?")
print(response)
# 优势:
# - 嵌入匹配更精确(基于单句)
# - 上下文更丰富(返回窗口内多句)
# - 减少信息截断问题
16.9.2 自动合并检索(Auto-merging Retrieval)¶
核心思想:如果多个子节点被检索到,自动合并为其父节点,获取更完整的上下文。
from llama_index.core.node_parser import (
HierarchicalNodeParser,
get_leaf_nodes,
get_root_nodes,
)
from llama_index.core.retrievers import AutoMergingRetriever
from llama_index.core import VectorStoreIndex, StorageContext
# Step 1: 构建层级节点
node_parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128] # 大→中→小 三层
)
nodes = node_parser.get_nodes_from_documents(documents)
leaf_nodes = get_leaf_nodes(nodes)
print(f"总节点: {len(nodes)}, 叶节点: {len(leaf_nodes)}")
# Step 2: 存储所有节点(包括父节点)
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(nodes)
# Step 3: 只索引叶子节点的嵌入
index = VectorStoreIndex(
leaf_nodes,
storage_context=storage_context
)
# Step 4: 使用自动合并检索器
base_retriever = index.as_retriever(similarity_top_k=12)
auto_merging_retriever = AutoMergingRetriever(
base_retriever,
storage_context=storage_context,
simple_ratio_thresh=0.5 # 超过50%的子节点被检索到就合并
)
# 查询
nodes = auto_merging_retriever.retrieve("深度学习的基本原理是什么?")
for node in nodes:
print(f"文本长度: {len(node.text)}")
print(f"内容: {node.text[:200]}...")
print("---")
16.9.3 递归检索(Recursive Retrieval)¶
from llama_index.core.retrievers import RecursiveRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import VectorStoreIndex
# 递归检索:先检索摘要,再深入检索相关细节
# 适用于多文档、多层级的复杂知识库
# Step 1: 创建文档摘要索引
from llama_index.core import SummaryIndex
summary_index = SummaryIndex.from_documents(documents)
# Step 2: 创建详细向量索引
vector_index = VectorStoreIndex.from_documents(documents)
# Step 3: 使用递归检索
# 先通过摘要定位相关文档,再通过向量检索获取细节
from llama_index.core.schema import IndexNode
# 为每个文档创建索引节点
index_nodes = []
for doc in documents:
# 创建指向详细索引的索引节点
index_node = IndexNode(
text=f"关于 {doc.metadata.get('file_name', 'unknown')} 的摘要",
index_id=doc.doc_id
)
index_nodes.append(index_node)
# 构建顶层索引
top_index = VectorStoreIndex(index_nodes)
# 创建递归检索器
recursive_retriever = RecursiveRetriever(
"vector",
retriever_dict={
"vector": top_index.as_retriever(similarity_top_k=3)
},
query_engine_dict={
doc.doc_id: vector_index.as_query_engine()
for doc in documents
},
verbose=True
)
# 创建查询引擎
query_engine = RetrieverQueryEngine.from_args(recursive_retriever)
response = query_engine.query("请详细介绍相关技术")
16.9.4 父文档检索(Parent Document Retrieval)¶
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.core.schema import IndexNode
# 小块用于精确匹配,大块用于上下文
small_parser = SentenceSplitter(chunk_size=128, chunk_overlap=20)
large_parser = SentenceSplitter(chunk_size=1024, chunk_overlap=200)
# 创建大块节点
large_nodes = large_parser.get_nodes_from_documents(documents)
# 创建小块节点,并链接到大块节点
all_nodes = []
for large_node in large_nodes:
small_nodes = small_parser.get_nodes_from_documents(
[Document(text=large_node.text)]
)
for small_node in small_nodes:
# 创建引用大块的索引节点
index_node = IndexNode(
text=small_node.text,
index_id=large_node.node_id
)
all_nodes.append(index_node)
# 存储大块节点和索引节点
storage_context = StorageContext.from_defaults()
storage_context.docstore.add_documents(large_nodes)
# 用小块建索引
index = VectorStoreIndex(all_nodes, storage_context=storage_context)
# 检索时返回大块上下文
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("你的问题")
16.10 重排序(Reranking)¶
16.10.1 Cohere Reranker¶
# pip install llama-index-postprocessor-cohere-rerank
from llama_index.postprocessor.cohere_rerank import CohereRerank
# 创建Cohere重排序器
cohere_reranker = CohereRerank(
api_key="your-cohere-api-key",
model="rerank-multilingual-v3.0",
top_n=3 # 重排后保留top-3
)
# 在查询引擎中使用
query_engine = index.as_query_engine(
similarity_top_k=10, # 先检索10个
node_postprocessors=[cohere_reranker] # 重排后保留3个
)
response = query_engine.query("深度学习与机器学习的区别")
print(response)
16.10.2 CrossEncoder Reranker¶
from llama_index.core.postprocessor import SentenceTransformerRerank
# 使用CrossEncoder模型重排
reranker = SentenceTransformerRerank(
model="cross-encoder/ms-marco-MiniLM-L-6-v2",
top_n=3
)
# 在查询引擎中使用
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[reranker]
)
response = query_engine.query("你的问题")
16.10.3 LLM Reranker¶
from llama_index.core.postprocessor import LLMRerank
# 使用LLM进行重排序
llm_reranker = LLMRerank(
top_n=3,
choice_batch_size=5 # 每批处理5个节点
)
query_engine = index.as_query_engine(
similarity_top_k=10,
node_postprocessors=[llm_reranker]
)
response = query_engine.query("你的问题")
16.10.4 重排序策略对比¶
| 方法 | 速度 | 质量 | 成本 | 适用场景 |
|---|---|---|---|---|
| Cohere Reranker | 快 | 高 | API付费 | 生产环境 |
| CrossEncoder | 中 | 高 | 本地免费 | 本地部署 |
| LLM Reranker | 慢 | 最高 | LLM调用费 | 高质量要求 |
| 无重排序 | 最快 | 一般 | 无额外成本 | 简单场景 |
16.11 响应合成器(Response Synthesizer)¶
16.11.1 响应合成模式¶
from llama_index.core import get_response_synthesizer
from llama_index.core.query_engine import RetrieverQueryEngine
# === Refine模式 ===
# 逐个节点精化答案(默认模式)
refine_synthesizer = get_response_synthesizer(response_mode="refine")
# === Compact模式 ===
# 将所有节点压缩到一个prompt中(减少LLM调用)
compact_synthesizer = get_response_synthesizer(response_mode="compact")
# === Tree Summarize模式 ===
# 自底向上树形摘要(适合大量节点)
tree_synthesizer = get_response_synthesizer(response_mode="tree_summarize")
# === Simple Summarize模式 ===
# 简单截断拼接后一次性摘要
simple_synthesizer = get_response_synthesizer(response_mode="simple_summarize")
# === Accumulate模式 ===
# 对每个节点单独生成答案,然后拼接
accumulate_synthesizer = get_response_synthesizer(response_mode="accumulate")
# === No Text模式 ===
# 只返回检索到的节点,不生成答案
no_text_synthesizer = get_response_synthesizer(response_mode="no_text")
# 在查询引擎中使用
retriever = index.as_retriever(similarity_top_k=5)
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=compact_synthesizer
)
response = query_engine.query("综合分析深度学习的发展历程")
print(response)
16.11.2 响应合成模式对比¶
| 模式 | LLM调用次数 | 适用场景 | 优点 | 缺点 |
|---|---|---|---|---|
| refine | N次(N=节点数) | 需要精确答案 | 答案精确 | 慢、成本高 |
| compact | 1-2次 | 通用场景 | 速度快 | 可能截断 |
| tree_summarize | log(N)次 | 大量节点 | 高效摘要 | 可能丢失细节 |
| simple_summarize | 1次 | 少量节点 | 最快 | 可能截断 |
| accumulate | N次 | 多角度答案 | 覆盖全面 | 答案冗长 |
16.11.3 自定义响应合成¶
from llama_index.core import PromptTemplate
# 自定义QA Prompt
qa_prompt_tmpl = PromptTemplate(
"上下文信息如下:\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n"
"请基于以上上下文信息(而非先验知识)回答以下问题。\n"
"如果上下文信息不足以回答问题,请明确说明'根据提供的信息无法回答此问题'。\n"
"问题: {query_str}\n"
"答案: "
)
# 自定义Refine Prompt
refine_prompt_tmpl = PromptTemplate(
"原始问题: {query_str}\n"
"已有答案: {existing_answer}\n"
"新的上下文信息:\n"
"---------------------\n"
"{context_msg}\n"
"---------------------\n"
"请根据新的上下文信息优化已有答案。如果新信息无用,返回原始答案。\n"
"优化后的答案: "
)
# 创建使用自定义prompt的合成器
synthesizer = get_response_synthesizer(
response_mode="compact",
text_qa_template=qa_prompt_tmpl,
refine_template=refine_prompt_tmpl
)
query_engine = index.as_query_engine(
response_synthesizer=synthesizer,
similarity_top_k=5
)
response = query_engine.query("你的问题")
16.12 多文档Agent(Multi-Document Agent)¶
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.agent import ReActAgent
from llama_index.core import VectorStoreIndex, SummaryIndex
# 为每个文档创建独立的索引和工具
doc_tools = []
for i, doc in enumerate(documents):
# 向量索引(细粒度检索)
vector_index = VectorStoreIndex.from_documents([doc])
vector_query_engine = vector_index.as_query_engine(similarity_top_k=3)
# 摘要索引(全文摘要)
summary_index = SummaryIndex.from_documents([doc])
summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize"
)
doc_name = doc.metadata.get("file_name", f"doc_{i}")
# 创建向量搜索工具
vector_tool = QueryEngineTool(
query_engine=vector_query_engine,
metadata=ToolMetadata(
name=f"vector_search_{i}",
description=f"用于在 {doc_name} 中搜索具体信息"
)
)
# 创建摘要工具
summary_tool = QueryEngineTool(
query_engine=summary_query_engine,
metadata=ToolMetadata(
name=f"summary_{i}",
description=f"用于获取 {doc_name} 的整体摘要"
)
)
doc_tools.extend([vector_tool, summary_tool])
# 创建多文档Agent
agent = ReActAgent.from_tools(
tools=doc_tools,
verbose=True,
max_iterations=10
)
# 跨文档查询
response = agent.chat("比较不同文档中提到的技术方案,给出推荐")
print(response)
16.13 Evaluation评估框架¶
16.13.1 忠实度评估(Faithfulness)¶
from llama_index.core.evaluation import FaithfulnessEvaluator
# 评估答案是否忠实于检索到的上下文
faithfulness_evaluator = FaithfulnessEvaluator()
# 获取查询结果
query_engine = index.as_query_engine()
response = query_engine.query("什么是Transformer?")
# 评估
eval_result = faithfulness_evaluator.evaluate_response(response=response)
print(f"忠实度评估 - 通过: {eval_result.passing}")
print(f"分数: {eval_result.score}")
print(f"反馈: {eval_result.feedback}")
16.13.2 相关性评估(Relevancy)¶
from llama_index.core.evaluation import RelevancyEvaluator
# 评估答案与查询的相关性
relevancy_evaluator = RelevancyEvaluator()
query = "深度学习有哪些应用?"
response = query_engine.query(query)
eval_result = relevancy_evaluator.evaluate_response(
query=query,
response=response
)
print(f"相关性评估 - 通过: {eval_result.passing}")
print(f"分数: {eval_result.score}")
print(f"反馈: {eval_result.feedback}")
16.13.3 正确性评估(Correctness)¶
from llama_index.core.evaluation import CorrectnessEvaluator
# 评估答案的正确性(需要参考答案)
correctness_evaluator = CorrectnessEvaluator()
query = "PyTorch的创建者是谁?"
response = query_engine.query(query)
reference = "PyTorch由Meta AI(前Facebook AI Research)团队创建"
eval_result = correctness_evaluator.evaluate(
query=query,
response=str(response),
reference=reference
)
print(f"正确性评估 - 分数: {eval_result.score}")
print(f"反馈: {eval_result.feedback}")
16.13.4 批量评估¶
from llama_index.core.evaluation import BatchEvalRunner
# 定义评估问题集
eval_questions = [
"什么是Transformer?",
"注意力机制如何工作?",
"BERT和GPT的区别是什么?",
"什么是预训练?",
"迁移学习的优势是什么?"
]
# 参考答案(可选)
reference_answers = [
"Transformer是一种基于注意力机制的神经网络架构...",
"注意力机制通过计算Query与Key的相似度...",
"BERT是双向编码器,GPT是自回归解码器...",
"预训练是在大规模数据上进行无监督学习...",
"迁移学习可以利用已有知识加速新任务学习..."
]
# 创建批量评估器
batch_runner = BatchEvalRunner(
evaluators={
"faithfulness": FaithfulnessEvaluator(),
"relevancy": RelevancyEvaluator(),
},
workers=4 # 并行评估
)
# 运行批量评估
eval_results = await batch_runner.aevaluate_queries( # await等待异步操作完成
query_engine=query_engine,
queries=eval_questions
)
# 汇总结果
for metric, results in eval_results.items():
scores = [r.score for r in results if r.score is not None]
avg_score = sum(scores) / len(scores) if scores else 0
print(f"{metric}: 平均分={avg_score:.2f}, 通过率={sum(1 for r in results if r.passing)/len(results):.0%}") # sum(1 for...if)统计通过数,:.0%格式化为百分比
16.13.5 RAG评估最佳实践¶
import pandas as pd
def evaluate_rag_pipeline(
query_engine,
test_questions: list,
reference_answers: list = None
) -> pd.DataFrame:
"""完整的RAG管道评估"""
faithfulness_eval = FaithfulnessEvaluator()
relevancy_eval = RelevancyEvaluator()
results = []
for i, question in enumerate(test_questions):
# 获取响应
response = query_engine.query(question)
# 忠实度评估
faith_result = faithfulness_eval.evaluate_response(response=response)
# 相关性评估
rel_result = relevancy_eval.evaluate_response(
query=question, response=response
)
result = {
"question": question,
"response": str(response)[:200],
"source_count": len(response.source_nodes),
"faithfulness": faith_result.score,
"relevancy": rel_result.score,
}
# 正确性评估(如果有参考答案)
if reference_answers and i < len(reference_answers):
correctness_eval = CorrectnessEvaluator()
corr_result = correctness_eval.evaluate(
query=question,
response=str(response),
reference=reference_answers[i]
)
result["correctness"] = corr_result.score
results.append(result)
df = pd.DataFrame(results)
# 打印汇总
print("\n=== RAG评估报告 ===")
print(f"评估问题数: {len(df)}")
print(f"忠实度均分: {df['faithfulness'].mean():.2f}")
print(f"相关性均分: {df['relevancy'].mean():.2f}")
if "correctness" in df.columns:
print(f"正确性均分: {df['correctness'].mean():.2f}")
return df
# 使用
eval_df = evaluate_rag_pipeline(
query_engine=query_engine,
test_questions=eval_questions,
reference_answers=reference_answers
)
16.14 LlamaIndex vs LangChain对比分析¶
16.14.1 设计哲学对比¶
| 维度 | LlamaIndex | LangChain |
|---|---|---|
| 核心定位 | 数据框架(RAG为核心) | 通用LLM应用开发框架 |
| 设计重心 | 数据连接与索引 | 链式组合与Agent |
| 抽象层级 | 高层抽象(开箱即用) | 低层抽象(灵活组合) |
| 学习曲线 | 相对平缓 | 相对陡峭 |
| RAG能力 | 深度优化,内置多种高级策略 | 基础支持,需手动组合 |
| Agent能力 | 基础支持 | 深度支持(LangGraph) |
| 评估工具 | 内置完善评估框架 | 需配合LangSmith |
| 社区规模 | 快速增长 | 更大更成熟 |
16.14.2 功能对比¶
# === LlamaIndex方式构建RAG ===
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# 3行代码完成RAG
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
response = index.as_query_engine().query("问题")
# === LangChain方式构建RAG ===
from langchain_community.document_loaders import DirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser
# 需要更多步骤
loader = DirectoryLoader("./data")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = splitter.split_documents(docs)
vectorstore = Chroma.from_documents(splits, OpenAIEmbeddings())
retriever = vectorstore.as_retriever()
template = "根据上下文回答:\n{context}\n\n问题:{question}"
prompt = ChatPromptTemplate.from_template(template)
llm = ChatOpenAI()
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
response = chain.invoke("问题")
16.14.3 何时选择LlamaIndex vs LangChain¶
选择LlamaIndex的场景: - RAG是核心需求 - 需要高级检索策略(句子窗口、自动合并等) - 需要内置评估框架 - 希望快速原型开发 - 数据索引和管理是重点
选择LangChain的场景: - 需要复杂的Agent工作流 - 需要高度自定义的链组合 - 使用LangGraph构建有状态Agent - 需要LangSmith进行监控调试 - 项目需要更细粒度的控制
最佳实践:两者结合使用:
# 使用LlamaIndex构建查询引擎
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(similarity_top_k=3)
# 将LlamaIndex查询引擎包装为LangChain工具
from langchain_core.tools import Tool
llama_tool = Tool(
name="knowledge_base",
description="企业知识库搜索工具",
func=lambda q: str(query_engine.query(q)) # lambda匿名函数
)
# 在LangChain Agent中使用
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain_core.prompts import PromptTemplate
# 使用LangChain的Agent逻辑 + LlamaIndex的检索能力
16.15 与OpenAI/本地模型集成配置¶
16.15.1 OpenAI集成¶
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
import os
os.environ["OPENAI_API_KEY"] = "your-api-key"
# GPT-4o配置
Settings.llm = OpenAI(
model="gpt-4o",
temperature=0.1,
max_tokens=4096,
api_key="your-api-key", # 或从环境变量读取
# 使用Azure OpenAI
# api_base="https://your-endpoint.openai.azure.com/",
# api_version="2024-02-01",
)
# 最新嵌入模型
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
dimensions=512 # 可指定维度
)
16.15.2 Azure OpenAI集成¶
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import Settings
Settings.llm = AzureOpenAI(
engine="gpt-4o", # 部署名
model="gpt-4o",
api_key="your-azure-api-key",
azure_endpoint="https://your-endpoint.openai.azure.com/",
api_version="2024-06-01"
)
Settings.embed_model = AzureOpenAIEmbedding(
deployment_name="text-embedding-3-small",
api_key="your-azure-api-key",
azure_endpoint="https://your-endpoint.openai.azure.com/",
api_version="2024-06-01"
)
16.15.3 Anthropic Claude集成¶
# pip install llama-index-llms-anthropic
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
Settings.llm = Anthropic(
model="claude-3-5-sonnet-20241022",
api_key="your-anthropic-api-key",
max_tokens=4096,
temperature=0.1
)
16.15.4 本地模型集成(Ollama)¶
# pip install llama-index-llms-ollama llama-index-embeddings-ollama
from llama_index.llms.ollama import Ollama
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.core import Settings
# 确保Ollama已安装并运行
# ollama pull qwen2.5:7b
# ollama pull nomic-embed-text
Settings.llm = Ollama(
model="qwen2.5:7b",
base_url="http://localhost:11434",
request_timeout=120.0,
temperature=0.1
)
Settings.embed_model = OllamaEmbedding(
model_name="nomic-embed-text",
base_url="http://localhost:11434"
)
16.15.5 HuggingFace本地模型集成¶
# pip install llama-index-llms-huggingface llama-index-embeddings-huggingface
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
# 中文Embedding模型
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-zh-v1.5",
cache_folder="./model_cache"
)
# 本地LLM
Settings.llm = HuggingFaceLLM(
model_name="Qwen/Qwen2.5-7B-Instruct",
tokenizer_name="Qwen/Qwen2.5-7B-Instruct",
context_window=4096,
max_new_tokens=1024,
device_map="auto",
model_kwargs={
"torch_dtype": "float16",
"load_in_4bit": True # 4bit量化
},
generate_kwargs={
"temperature": 0.1,
"top_p": 0.9,
"do_sample": True
}
)
16.16 实战项目:构建企业知识库问答系统¶
16.16.1 项目概述¶
目标:构建一个生产级的企业知识库问答系统,支持多种文档格式、高级检索策略、流式输出和评估。
技术栈: - LlamaIndex v0.11+ - Chroma向量数据库 - OpenAI GPT-4o / 本地Qwen模型 - FastAPI Web接口 - Sentence Window + Reranking
16.16.2 项目结构¶
enterprise_qa/
├── config.py # 配置文件
├── data_loader.py # 数据加载模块
├── index_builder.py # 索引构建模块
├── query_engine.py # 查询引擎模块
├── evaluator.py # 评估模块
├── api.py # FastAPI接口
├── main.py # 主入口
├── requirements.txt # 依赖
├── data/ # 文档数据
└── storage/ # 索引存储
16.16.3 完整代码实现¶
config.py - 配置文件:
"""企业知识库配置"""
import os
from dataclasses import dataclass, field
@dataclass # @dataclass自动生成__init__等方法
class AppConfig:
"""应用配置"""
# LLM配置
llm_provider: str = "openai" # openai, ollama, huggingface
llm_model: str = "gpt-4o"
llm_temperature: float = 0.1
llm_max_tokens: int = 4096
# Embedding配置
embed_provider: str = "openai"
embed_model: str = "text-embedding-3-small"
embed_dimensions: int = 1536
# 向量数据库配置
vector_store: str = "chroma" # chroma, pinecone, milvus
chroma_persist_dir: str = "./storage/chroma"
# 索引配置
chunk_size: int = 512
chunk_overlap: int = 100
similarity_top_k: int = 5
rerank_top_n: int = 3
# 数据路径
data_dir: str = "./data"
storage_dir: str = "./storage"
# API配置
api_host: str = "0.0.0.0"
api_port: int = 8000
# API密钥
openai_api_key: str | None = field(
default_factory=lambda: os.getenv("OPENAI_API_KEY") # 延迟求值:在创建实例时才读取环境变量,而非类定义时
)
def __post_init__(self):
os.makedirs(self.data_dir, exist_ok=True)
os.makedirs(self.storage_dir, exist_ok=True)
os.makedirs(self.chroma_persist_dir, exist_ok=True)
config = AppConfig()
data_loader.py - 数据加载模块:
"""数据加载模块"""
import os
from llama_index.core import SimpleDirectoryReader, Document
from llama_index.core.node_parser import SentenceWindowNodeParser
from config import config
class DataLoader:
"""企业文档加载器"""
def __init__(self):
self.node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text"
)
def load_documents(self, data_dir: str = None) -> list[Document]:
"""加载所有文档"""
data_dir = data_dir or config.data_dir
if not os.path.exists(data_dir):
raise FileNotFoundError(f"数据目录不存在: {data_dir}")
reader = SimpleDirectoryReader(
input_dir=data_dir,
required_exts=[".pdf", ".txt", ".md", ".docx", ".html"],
recursive=True,
filename_as_id=True
)
documents = reader.load_data()
print(f"已加载 {len(documents)} 个文档")
# 添加自定义元数据
for doc in documents:
doc.metadata["source_type"] = self._get_source_type(
doc.metadata.get("file_name", "")
)
return documents
def parse_nodes(self, documents: list[Document]):
"""解析文档为节点"""
nodes = self.node_parser.get_nodes_from_documents(documents)
print(f"生成了 {len(nodes)} 个节点")
return nodes
def _get_source_type(self, filename: str) -> str:
"""获取文档类型"""
ext = os.path.splitext(filename)[1].lower()
type_map = {
".pdf": "PDF文档",
".txt": "文本文件",
".md": "Markdown文档",
".docx": "Word文档",
".html": "网页",
}
return type_map.get(ext, "未知类型")
index_builder.py - 索引构建模块:
"""索引构建模块"""
import os
import chromadb
from llama_index.core import (
VectorStoreIndex,
StorageContext,
Settings,
load_index_from_storage
)
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from config import config
from data_loader import DataLoader
class IndexBuilder:
"""索引构建器"""
def __init__(self):
self._configure_settings()
self.data_loader = DataLoader()
self.index = None
def _configure_settings(self):
"""配置全局设置"""
if config.llm_provider == "openai":
Settings.llm = OpenAI(
model=config.llm_model,
temperature=config.llm_temperature,
max_tokens=config.llm_max_tokens,
api_key=config.openai_api_key
)
Settings.embed_model = OpenAIEmbedding(
model=config.embed_model,
api_key=config.openai_api_key
)
Settings.chunk_size = config.chunk_size
Settings.chunk_overlap = config.chunk_overlap
def build_index(self, force_rebuild: bool = False) -> VectorStoreIndex:
"""构建或加载索引"""
if not force_rebuild and self._index_exists():
print("从存储加载已有索引...")
return self._load_index()
print("构建新索引...")
return self._create_index()
def _create_index(self) -> VectorStoreIndex:
"""创建新索引"""
# 加载文档
documents = self.data_loader.load_documents()
nodes = self.data_loader.parse_nodes(documents)
# 创建Chroma向量存储
chroma_client = chromadb.PersistentClient(
path=config.chroma_persist_dir
)
chroma_collection = chroma_client.get_or_create_collection(
"enterprise_kb"
)
vector_store = ChromaVectorStore(
chroma_collection=chroma_collection
)
storage_context = StorageContext.from_defaults(
vector_store=vector_store
)
# 构建索引
self.index = VectorStoreIndex(
nodes=nodes,
storage_context=storage_context,
show_progress=True
)
# 持久化
self.index.storage_context.persist(
persist_dir=config.storage_dir
)
print("索引构建完成并已持久化")
return self.index
def _load_index(self) -> VectorStoreIndex:
"""加载已有索引"""
chroma_client = chromadb.PersistentClient(
path=config.chroma_persist_dir
)
chroma_collection = chroma_client.get_collection("enterprise_kb")
vector_store = ChromaVectorStore(
chroma_collection=chroma_collection
)
self.index = VectorStoreIndex.from_vector_store(vector_store)
return self.index
def _index_exists(self) -> bool:
"""检查索引是否已存在"""
return os.path.exists(
os.path.join(config.chroma_persist_dir, "chroma.sqlite3")
)
def add_documents(self, file_paths: list):
"""增量添加文档"""
if self.index is None:
raise RuntimeError("请先构建索引")
reader = SimpleDirectoryReader(input_files=file_paths)
new_documents = reader.load_data()
for doc in new_documents:
self.index.insert(doc)
self.index.storage_context.persist(persist_dir=config.storage_dir)
print(f"已添加 {len(new_documents)} 个新文档")
query_engine.py - 查询引擎模块:
"""查询引擎模块"""
from llama_index.core import VectorStoreIndex, PromptTemplate
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import QueryFusionRetriever
from llama_index.core.indices.postprocessor import (
MetadataReplacementPostProcessor,
SentenceTransformerRerank
)
from llama_index.core import get_response_synthesizer
from config import config
# 自定义QA Prompt
QA_PROMPT = PromptTemplate(
"你是企业知识库智能助手。请基于以下上下文信息回答用户的问题。\n"
"要求:\n"
"1. 答案必须基于上下文,不要编造信息\n"
"2. 如果上下文信息不足,请说明并建议用户查看相关文档\n"
"3. 使用清晰、结构化的格式回答\n"
"4. 在答案末尾标注信息来源\n\n"
"上下文信息:\n"
"---------------------\n"
"{context_str}\n"
"---------------------\n\n"
"用户问题:{query_str}\n\n"
"回答:"
)
class EnterpriseQueryEngine:
"""企业知识库查询引擎"""
def __init__(self, index: VectorStoreIndex):
self.index = index
self.query_engine = self._build_query_engine()
self.chat_engine = self._build_chat_engine()
def _build_query_engine(self) -> RetrieverQueryEngine:
"""构建高级查询引擎"""
# 使用句子窗口替换后处理器
window_postprocessor = MetadataReplacementPostProcessor(
target_metadata_key="window"
)
# 使用重排序后处理器
reranker = SentenceTransformerRerank(
model="cross-encoder/ms-marco-MiniLM-L-6-v2",
top_n=config.rerank_top_n
)
# 自定义响应合成器
response_synthesizer = get_response_synthesizer(
response_mode="compact",
text_qa_template=QA_PROMPT
)
# 构建查询引擎
query_engine = self.index.as_query_engine(
similarity_top_k=config.similarity_top_k,
node_postprocessors=[window_postprocessor, reranker],
response_synthesizer=response_synthesizer
)
return query_engine
def _build_chat_engine(self):
"""构建对话引擎"""
return self.index.as_chat_engine(
chat_mode="condense_plus_context",
similarity_top_k=config.similarity_top_k,
verbose=True
)
def query(self, question: str) -> dict:
"""单次查询"""
response = self.query_engine.query(question)
return {
"answer": str(response),
"sources": [
{
"text": node.text[:300],
"score": node.score,
"metadata": node.metadata
}
for node in response.source_nodes
]
}
def chat(self, message: str) -> dict:
"""对话查询"""
response = self.chat_engine.chat(message)
return {
"answer": str(response),
"sources": [
{
"text": node.text[:300],
"score": node.score if hasattr(node, 'score') else None, # hasattr检查属性是否存在,不存在则返回None防止报错
"metadata": node.metadata
}
for node in response.source_nodes
]
}
def stream_query(self, question: str):
"""流式查询"""
streaming_engine = self.index.as_query_engine(
similarity_top_k=config.similarity_top_k,
streaming=True
)
streaming_response = streaming_engine.query(question)
for text in streaming_response.response_gen:
yield text # yield使函数变为生成器,每次产出一个文本片段实现流式输出
def reset_chat(self):
"""重置对话"""
self.chat_engine.reset()
api.py - FastAPI接口:
"""FastAPI Web接口"""
from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from index_builder import IndexBuilder
from query_engine import EnterpriseQueryEngine
from config import config
app = FastAPI(title="企业知识库问答系统", version="1.0.0")
# CORS配置
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# 全局变量
builder = None
engine = None
class QueryRequest(BaseModel): # Pydantic BaseModel:自动数据验证和序列化
question: str
mode: str = "query" # query 或 chat
class QueryResponse(BaseModel):
answer: str
sources: list[dict] = []
@app.on_event("startup")
async def startup(): # async def定义协程函数
"""启动时构建索引"""
global builder, engine
builder = IndexBuilder()
index = builder.build_index()
engine = EnterpriseQueryEngine(index)
print("知识库系统启动完成!")
@app.post("/query", response_model=QueryResponse)
async def query(request: QueryRequest):
"""查询接口"""
if engine is None:
raise HTTPException(status_code=503, detail="系统未就绪")
try: # try/except捕获异常,防止程序崩溃
if request.mode == "chat":
result = engine.chat(request.question)
else:
result = engine.query(request.question)
return QueryResponse(
answer=result["answer"],
sources=result["sources"]
)
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.get("/stream")
async def stream_query(question: str):
"""流式查询接口"""
if engine is None:
raise HTTPException(status_code=503, detail="系统未就绪")
return StreamingResponse(
engine.stream_query(question),
media_type="text/plain"
)
@app.post("/reset")
async def reset_chat():
"""重置对话"""
if engine:
engine.reset_chat()
return {"status": "对话已重置"}
@app.get("/health")
async def health():
"""健康检查"""
return {"status": "healthy", "version": "1.0.0"}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host=config.api_host, port=config.api_port)
main.py - 主入口:
"""企业知识库问答系统 - 主入口"""
import argparse
from index_builder import IndexBuilder
from query_engine import EnterpriseQueryEngine
from config import config
def interactive_mode():
"""交互式问答模式"""
print("=" * 60)
print("企业知识库问答系统 v1.0")
print("=" * 60)
# 构建索引
builder = IndexBuilder()
index = builder.build_index()
engine = EnterpriseQueryEngine(index)
print("\n系统就绪!输入问题开始对话(输入 'quit' 退出,'reset' 重置对话)\n")
while True:
question = input("📝 你: ").strip()
if question.lower() == "quit":
print("再见!")
break
elif question.lower() == "reset":
engine.reset_chat()
print("对话已重置\n")
continue
elif not question:
continue
result = engine.chat(question)
print(f"\n🤖 助手: {result['answer']}")
if result['sources']:
print(f"\n📎 参考来源:")
for i, source in enumerate(result['sources'][:3]):
print(f" [{i+1}] {source['metadata'].get('file_name', 'N/A')}")
print()
def api_mode():
"""API服务模式"""
import uvicorn
from api import app
uvicorn.run(app, host=config.api_host, port=config.api_port)
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="企业知识库问答系统")
parser.add_argument(
"--mode",
choices=["interactive", "api"],
default="interactive",
help="运行模式"
)
parser.add_argument("--rebuild", action="store_true", help="强制重建索引")
args = parser.parse_args()
if args.mode == "api":
api_mode()
else:
interactive_mode()
16.17 练习题 ✏️¶
练习题1:基础索引构建¶
题目:使用LlamaIndex加载本地Markdown文件目录,构建VectorStoreIndex,并实现一个支持流式输出的查询引擎。
# 参考答案
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.llms.openai import OpenAI
Settings.llm = OpenAI(model="gpt-4o-mini", temperature=0.1)
documents = SimpleDirectoryReader(
input_dir="./docs",
required_exts=[".md"],
recursive=True
).load_data()
index = VectorStoreIndex.from_documents(documents, show_progress=True)
# 流式查询
streaming_engine = index.as_query_engine(
streaming=True,
similarity_top_k=3
)
response = streaming_engine.query("总结所有文档的要点")
for text in response.response_gen:
print(text, end="", flush=True)
练习题2:混合检索实现¶
题目:实现向量检索+关键词检索的混合检索,使用倒数排名融合(RRF)合并结果。
练习题3:句子窗口RAG¶
题目:使用SentenceWindowNodeParser和MetadataReplacementPostProcessor构建句子窗口RAG管道。
练习题4:知识库评估¶
题目:为你的RAG系统编写评估脚本,使用Faithfulness和Relevancy评估器,生成评估报告。
练习题5:多文档Agent¶
题目:为3种不同类型的文档(技术文档、用户手册、FAQ)分别创建索引,然后构建一个Multi-Document Agent实现跨文档问答。
练习题6:生产级持久化¶
题目:将LlamaIndex索引持久化到Chroma,实现增量更新(添加新文档不重建整个索引)。
练习题7:自定义响应合成¶
题目:创建自定义的QA模板和Refine模板,使答案更加结构化和专业。
16.18 面试要点 📋¶
面试题1:LlamaIndex的核心架构是什么?¶
答案: LlamaIndex遵循 Documents → Nodes → Index → Query Engine 的四阶段管道: 1. Documents:原始数据的抽象,包含文本和元数据 2. Nodes:Documents经过解析和分块后的细粒度单元 3. Index:将Nodes组织成可高效检索的数据结构(如向量索引、关键词索引等) 4. Query Engine:接收自然语言查询,通过检索和LLM生成答案
面试题2:VectorStoreIndex和SummaryIndex的区别?¶
答案: - VectorStoreIndex:基于向量相似度检索,只返回最相关的top-k节点,适合精确定位信息 - SummaryIndex:遍历所有节点进行摘要,适合需要全局概览的场景 - 时间复杂度不同:向量索引O(log n),摘要索引O(n) - 适用场景不同:向量索引适合QA,摘要索引适合文档总结
面试题3:什么是句子窗口检索(Sentence Window Retrieval)?¶
答案: 句子窗口检索是一种高级RAG策略,核心思想是"精确匹配+宽泛上下文": - 索引阶段:将文档解析为单句,每句附带其前后N句的窗口上下文 - 嵌入阶段:只对单句进行嵌入(更精确的语义匹配) - 检索阶段:匹配单句后,用窗口上下文替换,提供更丰富的信息 - 优势:兼顾检索精度和上下文完整性
面试题4:自动合并检索如何工作?¶
答案: 自动合并检索(Auto-merging Retrieval)使用层级节点结构: 1. 构建时创建多层级节点(如2048→512→128 tokens) 2. 只索引叶子节点(最小粒度)的嵌入 3. 检索时如果超过阈值比例的子节点被检索到,自动合并为父节点 4. 好处:精确匹配小块,但返回更完整的大块上下文
面试题5:LlamaIndex中有哪些重排序方法?¶
答案: 1. Cohere Reranker:使用Cohere API的专业重排序模型,速度快、质量高 2. CrossEncoder Reranker:使用CrossEncoder模型(如ms-marco),本地运行,无API成本 3. LLM Reranker:使用LLM对检索结果进行重排序,质量最高但最慢 4. 重排序在"先召回后精排"的两阶段策略中至关重要
面试题6:QueryEngine和ChatEngine的区别?¶
答案: - QueryEngine:一问一答模式,无对话历史,每次查询独立 - ChatEngine:保持对话历史,支持多轮对话,自动处理指代消解 - ChatEngine有多种模式:condense_question(压缩历史为问题)、condense_plus_context(压缩+检索)、context(每次检索)、simple(无检索纯对话)
面试题7:如何评估RAG系统的质量?¶
答案: LlamaIndex提供三个核心评估维度: 1. Faithfulness(忠实度):答案是否忠实于检索到的上下文,不编造信息 2. Relevancy(相关性):答案是否与用户查询相关 3. Correctness(正确性):答案是否与参考答案一致 可使用BatchEvalRunner进行批量评估,日常实践中建议构建评估数据集持续监控质量
面试题8:LlamaIndex和LangChain的主要区别?¶
答案: - 定位:LlamaIndex专注RAG和数据连接,LangChain是通用LLM应用框架 - RAG:LlamaIndex内置句子窗口、自动合并等高级策略,LangChain需手动组合 - Agent:LangChain(LangGraph)在Agent方面更强大 - 评估:LlamaIndex内置评估框架,LangChain依赖LangSmith - 最佳实践:两者可结合使用,LlamaIndex做检索,LangChain做Agent编排
面试题9:如何优化LlamaIndex的检索质量?¶
答案: 1. 分块策略优化:选择合适的chunk_size和overlap 2. 使用高级检索:句子窗口、自动合并、混合检索 3. 添加重排序:检索后使用Reranker精排 4. 元数据过滤:利用AutoRetriever自动生成过滤条件 5. 优化Embedding模型:选择适合领域的嵌入模型 6. 查询转换:使用子问题分解或查询融合
面试题10:如何实现混合检索?¶
答案: 混合检索(Hybrid Retrieval)结合向量检索和关键词检索的优势: 1. 创建VectorStoreIndex的向量检索器 2. 创建KeywordTableIndex的关键词检索器 3. 使用QueryFusionRetriever融合结果 4. 采用倒数排名融合(Reciprocal Rank Fusion, RRF)合并排序 5. 好处:语义理解+精确匹配互补,召回率更高
面试题11:LlamaIndex如何实现增量索引更新?¶
答案:
# 增量插入新文档
index.insert(new_document)
# 删除文档
index.delete_ref_doc("doc_id")
# 更新文档(先删后插)
index.delete_ref_doc("doc_id")
index.insert(updated_document)
# 配合向量数据库(如Chroma)的持久化,实现真正的增量更新
面试题12:响应合成器的不同模式适用哪些场景?¶
答案: - refine:逐节点精化,适合需要高质量答案的场景 - compact:压缩到一个prompt,适合大多数通用场景(推荐默认) - tree_summarize:树形摘要,适合大量检索结果的汇总 - simple_summarize:简单拼接,适合少量节点 - accumulate:独立生成+拼接,适合需要多角度回答
面试题13:如何处理多文档问答?¶
答案: 使用Multi-Document Agent模式: 1. 为每个文档创建独立的VectorIndex和SummaryIndex 2. 将每个索引封装为QueryEngineTool 3. 创建ReActAgent,让Agent自动选择合适的工具 4. Agent可以跨文档检索、对比、汇总 5. 适合企业多部门文档库的统一问答
面试题14:生产环境部署LlamaIndex需要注意什么?¶
答案: 1. 存储选择:使用专业向量数据库(Chroma/Pinecone/Milvus)而非内存 2. 索引更新:实现增量更新机制,避免全量重建 3. 缓存策略:缓存常见查询结果,减少LLM调用 4. 并发处理:使用异步接口(ainvoke/aquery)处理并发请求 5. 监控与评估:建立评估流水线,持续监控答案质量 6. 安全性:API密钥管理、输入验证、输出过滤 7. 成本控制:选择合适的模型、优化chunk_size、使用缓存
面试题15:LlamaIndex中的回调机制如何用于调试?¶
答案:
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
# 创建调试处理器
debug_handler = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([debug_handler])
Settings.callback_manager = callback_manager
# 执行查询后查看事件追踪
index = VectorStoreIndex.from_documents(documents)
response = index.as_query_engine().query("问题")
# 查看检索和LLM调用的详细日志
print(debug_handler.get_event_pairs())
16.19 延伸阅读 📚¶
- 官方文档:LlamaIndex Documentation
- GitHub仓库:run-llama/llama_index
- LlamaIndex Blog:官方博客,介绍最新功能和最佳实践
- LlamaHub:llamahub.ai - 社区数据加载器和工具集合
- 论文:《Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks》
- 论文:《Lost in the Middle: How Language Models Use Long Contexts》
- Advanced RAG Techniques:LlamaIndex官方高级RAG教程
- Building Production RAG:Jerry Liu的生产级RAG系列教程
- RAG评估:RAGAS框架文档
- 向量数据库选型:Chroma / Pinecone / Weaviate / Milvus官方文档
📝 小结¶
本章系统介绍了LlamaIndex框架的核心内容:
- ✅ 核心架构:Documents → Nodes → Index → Query Engine 四阶段管道
- ✅ 数据加载:SimpleDirectoryReader及多种专用加载器
- ✅ 索引类型:VectorStoreIndex、SummaryIndex、TreeIndex、KeywordTableIndex、KnowledgeGraphIndex
- ✅ 存储持久化:本地存储与Chroma/Pinecone/Weaviate/Milvus集成
- ✅ 查询引擎:QueryEngine vs ChatEngine、流式输出、子问题查询
- ✅ 检索策略:向量检索、关键词检索、混合检索、自动检索
- ✅ 高级RAG:句子窗口检索、自动合并检索、递归检索
- ✅ 重排序:Cohere Reranker、CrossEncoder、LLM Reranker
- ✅ 响应合成:Refine、Compact、Tree Summarize等多种模式
- ✅ 评估框架:Faithfulness、Relevancy、Correctness评估
- ✅ 实战项目:企业知识库问答系统完整代码
通过本章学习,你应该能够: - 使用LlamaIndex构建生产级RAG系统 - 选择合适的索引类型和检索策略 - 运用高级RAG技术提升答案质量 - 使用评估框架持续优化系统表现 - 应对LlamaIndex相关面试问题
🔗 下一步¶
下一章我们将深入学习多Agent框架与协作,掌握LangGraph、CrewAI、AutoGen等多Agent系统的构建。
继续学习: 17-多Agent框架.md
最后更新日期:2026-02-12 适用版本:LLM应用指南 v2026