📖 NLP到Agent工具调用的演进¶
⚠️ 时效性说明:本章涉及前沿模型/协议/工具链等内容,请以论文原文、官方文档和最新实践为准。
学习时间:8-10小时 | 难度:⭐⭐⭐⭐ | 前置:10-预训练语言模型、11-大模型时代的NLP、13-对话系统与Agent化NLP
💡 从规则NLP到Agent工具调用,AI正在从"理解语言"演进到"使用工具做事"。
📖 1. 传统NLP到LLM的演进¶
1.1 NLP发展四阶段¶
┌─────────────────────────────────────────────────────────────────────┐
│ NLP发展时间线 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 规则NLP 统计NLP 深度学习NLP 大语言模型
│ (1960s-1990s) (1990s-2012) (2013-2022) (2022-至今)
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ │ ELIZA │ │ HMM/CRF │ │ BERT │ │ GPT-4 │
│ │ 模板匹配│ │ 机器学习 │ │ Transformer│ │ + Agent │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │
│ 特点: 特点: 特点: 特点:
│ - 人工规则 - 统计模型 - 预训练+微调 - 通用能力
│ - 可解释 - 需要特征工程 - 端到端学习 - 上下文学习
│ - 覆盖有限 - 数据驱动 - 大规模数据 - 工具使用
│ │
└─────────────────────────────────────────────────────────────────────┘
1.2 各阶段核心技术对比¶
| 阶段 | 代表技术 | 优点 | 缺点 | 典型任务 |
|---|---|---|---|---|
| 规则NLP | 正则表达式、有限状态机、对话树 | 可控、可解释 | 规则难以覆盖复杂情况 | 简单问答、菜单导航 |
| 统计NLP | HMM、CRF、朴素贝叶斯、SVM | 基于数据、能处理噪声 | 特征工程繁琐、难以捕获长距离依赖 | 词性标注、实体识别 |
| 深度学习NLP | RNN、LSTM、Seq2Seq、Attention | 端到端、自动特征提取 | 需要大量标注数据、训练成本高 | 机器翻译、文本生成 |
| 大语言模型 | Transformer、预训练语言模型、RLHF | 通用能力强、few-shot | 推理成本高、可能出现幻觉 | 开放域对话、复杂推理 |
1.3 任务型对话 vs LLM对话¶
┌────────────────────────────────────────────────────────────────────┐
│ 任务型对话系统 vs LLM对话 │
├────────────────────────────────────────────────────────────────────┤
│ │
│ 任务型对话系统 LLM对话 │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ NLU → DST → │ │ 端到端 │ │
│ │ Policy → NLG │ │ 直接生成回复 │ │
│ └──────────────────┘ └──────────────────┘ │
│ │
│ 优势: 优势: │
│ ✓ 精确控制意图识别 ✓ 泛化能力强 │
│ ✓ 槽位填充准确 ✓ 零样本/少样本能力 │
预测的执行路径│ ✓ 可 ✓ 开放域能力 │
│ ✓ 便于调试和测试 ✓ 知识面广 │
│ │
│ 劣势: 劣势: │
│ ✗ 难以覆盖长尾意图 ✗ 执行路径不透明 │
│ ✗ 需要大量标注数据 ✗ 可能产生幻觉 │
│ ✗ 迁移到新领域成本高 ✗ 推理成本较高 │
│ │
│ 适用场景: 适用场景: │
│ - 单一领域任务完成 - 开放域问答 │
│ - 高可靠性要求 - 创意写作 │
│ - 可解释性要求 - 复杂推理 │
│ │
└────────────────────────────────────────────────────────────────────┘
1.4 模板匹配 → 意图识别 → 端到端对话¶
# 阶段1:模板匹配 (Rule-Based)
import re
class RuleBasedDialog:
"""基于规则的对话系统"""
def __init__(self):
# 定义意图模板
self.templates = {
r'.*天气.*': self.handle_weather,
r'.*订.*票.*': self.handle_booking,
r'.*播放.*': self.handle_music,
}
def handle_weather(self, text):
return "请问您想查询哪个城市的天气?"
def handle_booking(self, text):
return "好的,请问您想预订什么?"
def handle_music(self, text):
return "好的,请问您想听什么歌曲?"
def respond(self, text):
for pattern, handler in self.templates.items():
if re.search(pattern, text):
return handler(text)
return "抱歉,我不太理解您的问题。"
# 测试
dialog = RuleBasedDialog()
print(dialog.respond("今天天气怎么样")) # "请问您想查询哪个城市的天气?"
print(dialog.respond("帮我订一张机票")) # "好的,请问您想预订什么?"
# 阶段2:意图识别 + 槽填充 (ML-Based)
import torch
import torch.nn as nn
from transformers import BertModel, BertTokenizer
class IntentClassifier(nn.Module):
"""基于BERT的意图分类模型"""
def __init__(self, num_intents):
super().__init__()
self.bert = BertModel.from_pretrained('bert-base-chinese')
self.classifier = nn.Linear(768, num_intents)
def forward(self, input_ids, attention_mask):
outputs = self.bert(input_ids, attention_mask=attention_mask)
# 使用[CLS] token进行分类
cls_output = outputs.last_hidden_state[:, 0, :]
return self.classifier(cls_output)
# 意图类别
INTENT_LABELS = [
'weather', 'booking', 'music', 'navigation',
'search', 'reminder', 'other'
]
# 使用示例
tokenizer = BertTokenizer.from_pretrained('bert-base-chinese')
model = IntentClassifier(num_intents=len(INTENT_LABELS))
text = "今天天气怎么样"
inputs = tokenizer(text, return_tensors='pt', padding=True)
logits = model(inputs['input_ids'], inputs['attention_mask'])
predicted_intent = INTENT_LABELS[logits.argmax(-1).item()]
print(f"识别意图: {predicted_intent}") # "weather"
# 阶段3:端到端对话 (LLM-Based)
LLM_DIALOG_PROMPT = """你是一个友好的AI助手。请根据对话历史回答用户问题。
对话历史:
{history}
用户:{user_input}
助手:"""
def llm_dialog(history, user_input):
"""LLM端到端对话"""
prompt = LLM_DIALOG_PROMPT.format(
history="\n".join([f"{r}: {c}" for r, c in history]),
user_input=user_input
)
# 实际调用LLM API
# response = openai.ChatCompletion.create(
# model="gpt-4",
# messages=[{"role": "user", "content": prompt}]
# )
# return response.choices[0].message.content
return "好的,让我来帮您..."
# 测试
history = [
("用户", "我想了解今天的天气"),
("助手", "请问您想查询哪个城市的天气呢?"),
("用户", "北京")
]
response = llm_dialog(history, "北京的天气怎么样")
print(f"回复: {response}")
📖 2. Tool Use / Function Calling¶
2.1 为什么需要Tool Use¶
┌─────────────────────────────────────────────────────────────────────┐
│ LLM的局限性 vs Tool Use │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ LLM局限性 解决方案 │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ❌ 数学计算 │ │ ✓ 计算器工具 │ │
│ │ 23*47=1079? │ ───────► │ =1081 │ │
│ └─────────────────┘ └─────────────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ❌ 实时信息 │ │ ✓ 搜索API │ │
│ │ 不知道今天天气 │ ───────► │ 北京: 晴, 25°C │ │
│ └─────────────────┘ └─────────────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ❌ 专业知识 │ │ ✓ 私有数据库查询 │ │
│ │ 不知道公司财报 │ ───────► │ 营收: 100亿 │ │
│ └─────────────────┘ └─────────────────┘ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ❌ 执行操作 │ │ ✓ API调用 │ │
│ │ 不能真发邮件 │ ───────► │ 邮件已发送 │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
2.2 OpenAI Function Calling机制¶
import json
from typing import List, Dict, Any, Optional
# 定义工具函数
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "获取指定城市的天气信息",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "城市名称,如北京、上海"
},
"date": {
"type": "string",
"description": "日期,格式为YYYY-MM-DD,默认为今天"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "calculator",
"description": "执行数学计算",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "数学表达式,如 23*47+100"
}
},
"required": ["expression"]
}
}
},
{
"type": "function",
"function": {
"name": "search_knowledge_base",
"description": "搜索知识库获取相关信息",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
},
"top_k": {
"type": "integer",
"description": "返回结果数量",
"default": 5
}
},
"required": ["query"]
}
}
}
]
# 工具执行器
class ToolExecutor:
"""工具执行器"""
def __init__(self):
self.tools = {
"get_weather": self._get_weather,
"calculator": self._calculator,
"search_knowledge_base": self._search_knowledge_base,
}
def execute(self, tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
"""执行工具"""
if tool_name not in self.tools:
return {"error": f"未知工具: {tool_name}"}
try:
result = self.tools[tool_name](**arguments)
return {"success": True, "result": result}
except Exception as e:
return {"success": False, "error": str(e)}
def _get_weather(self, city: str, date: Optional[str] = None) -> Dict[str, Any]:
"""模拟天气查询"""
weather_data = {
"北京": {"weather": "晴", "temp": "25°C", "humidity": "45%"},
"上海": {"weather": "多云", "temp": "23°C", "humidity": "65%"},
"深圳": {"weather": "雷阵雨", "temp": "28°C", "humidity": "80%"},
}
return weather_data.get(city, {"weather": "未知", "temp": "N/A", "humidity": "N/A"})
def _calculator(self, expression: str) -> Any:
"""安全的数学计算"""
# 实际使用应该用更安全的eval方式
allowed_operators = {'+', '-', '*', '/', '(', ')', '.', ' '}
if not all(c.isdigit() or c in allowed_operators for c in expression):
raise ValueError("不支持的运算符")
return eval(expression, {"__builtins__": {}}, {})
def _search_knowledge_base(self, query: str, top_k: int = 5) -> List[Dict[str, str]]:
"""模拟知识库搜索"""
# 实际应用应连接向量数据库
return [
{"title": f"相关文档{i}", "content": f"关于{query}的相关内容{i}..."}
for i in range(1, top_k + 1)
]
# 模拟OpenAI API调用
def call_openai_with_functions(messages: List[Dict], tools: List[Dict]) -> Dict:
"""
模拟OpenAI Function Calling调用
实际应用中:
response = openai.ChatCompletion.create(
model="gpt-4-turbo",
messages=messages,
tools=tools
)
return response
"""
# 模拟LLM决定调用get_weather
last_message = messages[-1]["content"]
if "天气" in last_message:
# 提取城市名(简化版)
cities = ["北京", "上海", "深圳", "广州", "杭州"]
city = next((c for c in cities if c in last_message), "北京")
return {
"choices": [{
"message": {
"role": "assistant",
"tool_calls": [{
"id": "call_001",
"type": "function",
"function": {
"name": "get_weather",
"arguments": json.dumps({"city": city})
}
}]
}
}]
}
return {"choices": [{"message": {"role": "assistant", "content": "直接回答..."}}]}
# 完整的Function Calling流程
def function_calling_demo(user_query: str):
"""完整的Function Calling演示"""
print(f"\n用户问题: {user_query}")
print("-" * 50)
# Step 1: 初始化消息
messages = [{"role": "user", "content": user_query}]
# Step 2: 调用LLM(带工具定义)
response = call_openai_with_functions(messages, tools)
assistant_msg = response["choices"][0]["message"]
print(f"Step 1 - LLM响应: {assistant_msg}")
# Step 3: 检查是否有工具调用
if "tool_calls" in assistant_msg:
# Step 4: 执行工具
executor = ToolExecutor()
for tool_call in assistant_msg["tool_calls"]:
tool_name = tool_call["function"]["name"]
arguments = json.loads(tool_call["function"]["arguments"])
print(f"\nStep 2 - 执行工具: {tool_name}")
print(f" 参数: {arguments}")
tool_result = executor.execute(tool_name, arguments)
print(f" 结果: {tool_result}")
# Step 5: 将工具结果加入消息
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"content": json.dumps(tool_result)
})
# Step 6: 再次调用LLM生成最终回答
# final_response = openai.ChatCompletion.create(
# model="gpt-4-turbo",
# messages=messages
# )
# final_answer = final_response["choices"][0]["message"]["content"]
final_answer = f"根据查询结果,{arguments['city']}的天气信息是:{tool_result}"
print(f"\nStep 3 - 最终回答: {final_answer}")
return final_answer
return assistant_msg.get("content", "无法处理")
# 测试
function_calling_demo("北京今天天气怎么样?")
function_calling_demo("帮我计算 123 * 456 + 789")
2.3 Claude/Gemini工具调用¶
# Claude Tool Use (Anthropic API)
CLAUDE_TOOLS = [
{
"name": "web_search",
"description": "搜索互联网获取最新信息",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "搜索关键词"},
"max_results": {"type": "integer", "description": "最大结果数", "default": 5}
},
"required": ["query"]
}
},
{
"name": "code_execution",
"description": "在沙箱环境中执行Python代码",
"input_schema": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "要执行的Python代码"},
"language": {"type": "string", "description": "编程语言", "default": "python"}
},
"required": ["code"]
}
}
]
# Claude API调用示例
def claude_tool_use(user_message: str):
"""Claude Function Calling"""
# 实际调用
# response = anthropic.messages.create(
# model="claude-3-opus-20240229",
# max_tokens=1024,
# tools=CLAUDE_TOOLS,
# messages=[{"role": "user", "content": user_message}]
# )
# 检查是否有tool_use
# if response.stop_reason == "tool_use":
# tool_use = response.content[-1]
# # 执行工具...
print(f"Claude工具调用示例: {user_message}")
return "Claude返回结果"
# Gemini Tool Use (Google AI)
GEMINI_TOOLS = [
{
"function_declarations": [
{
"name": "generate_image",
"description": "根据文本描述生成图片",
"parameters": {
"type": "object",
"properties": {
"prompt": {"type": "string", "description": "图片描述"},
"size": {
"type": "string",
"enum": ["1024x1024", "512x512", "256x256"],
"default": "1024x1024"
}
},
"required": ["prompt"]
}
}
]
}
]
# Gemini API调用示例
def gemini_tool_use(prompt: str):
"""Gemini Function Calling"""
# 实际调用
# model = genai.GenerativeModel('gemini-pro', tools=GEMINI_TOOLS)
# response = model.generate_content(prompt)
# 检查function_call
# if response.candidates[0].content.parts[0].function_call:
# fc = response.candidates[0].content.parts[0].function_call
# # 执行工具...
print(f"Gemini工具调用示例: {prompt}")
return "Gemini返回结果"
2.4 工具描述的Schema设计¶
# 良好的工具Schema设计原则
# ❌ 不好的工具定义
BAD_TOOL = {
"type": "function",
"function": {
"name": "do_something",
"description": "做某件事", # 描述太模糊
"parameters": {
"type": "object",
"properties": {
"data": {"type": "string"} # 参数名和类型不清晰
}
}
}
}
# ✅ 好的工具定义
GOOD_TOOL = {
"type": "function",
"function": {
"name": "query_company_financials",
"description": """查询公司财务数据,包括营收、利润、现金流等指标。
适用于用户询问公司财务状况、财务表现等问题。""",
"parameters": {
"type": "object",
"properties": {
"company_name": {
"type": "string",
"description": "公司全称或股票代码,如'阿里巴巴'或'9988.HK'"
},
"fiscal_year": {
"type": "string",
"description": "财年,格式为'YYYY',如'2023'。默认为最新财年"
},
"metrics": {
"type": "array",
"items": {"type": "string"},
"description": """要查询的财务指标列表,可选值:
- revenue: 营收
- net_profit: 净利润
- gross_margin: 毛利率
- operating_cash_flow: 经营现金流
如不指定则返回所有指标""",
"default": ["revenue", "net_profit"]
}
},
"required": ["company_name"]
}
}
}
# 复杂工具定义的完整示例
COMPLEX_TOOL = {
"type": "function",
"function": {
"name": "book_flight",
"description": """预订国内或国际航班机票。
此工具连接航空公司和旅行代理API,可以:
- 查询航班可用性和价格
- 预订单程或往返机票
- 选择座位等级(经济/商务/头等舱)
注意:此操作会产生实际费用,请在执行前确认用户需求。""",
"parameters": {
"type": "object",
"properties": {
"trip_type": {
"type": "string",
"enum": ["one_way", "round_trip"],
"description": "旅行类型:单程或往返"
},
"passengers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string", "description": "乘客姓名"},
"id_type": {
"type": "string",
"enum": ["id_card", "passport"],
"description": "证件类型"
},
"id_number": {"type": "string", "description": "证件号码"}
},
"required": ["name", "id_type", "id_number"]
},
"description": "乘客信息列表,至少需要1人"
},
"segments": {
"type": "array",
"items": {
"type": "object",
"properties": {
"departure_city": {"type": "string", "description": "出发城市"},
"arrival_city": {"type": "string", "description": "到达城市"},
"departure_date": {
"type": "string",
"description": "出发日期,格式YYYY-MM-DD"
},
"flight_class": {
"type": "string",
"enum": ["economy", "business", "first"],
"description": "舱位等级",
"default": "economy"
}
},
"required": ["departure_city", "arrival_city", "departure_date"]
},
"description": "航段信息,去程和返程(往返需要2个航段)"
}
},
"required": ["trip_type", "passengers", "segments"]
}
}
}
def generate_tool_schema(tool_name: str, description: str, params: dict) -> dict:
"""工具Schema生成器"""
return {
"type": "function",
"function": {
"name": tool_name,
"description": description,
"parameters": {
"type": "object",
"properties": params,
"required": [k for k, v in params.items() if v.get("required", False)]
}
}
}
📖 3. Agent架构¶
3.1 ReAct (Reasoning + Acting) 范式¶
┌─────────────────────────────────────────────────────────────────────┐
│ ReAct工作流程 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 用户问题 │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ 循环迭代 │ │
│ │ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Thought │───▶│ Action │───▶│Observa- │ │ │
│ │ │ (思考) │ │ (行动) │ │ tion │ │ │
│ │ └──────────┘ └──────────┘ │ (观察) │ │ │
│ │ │ └─────┬────┘ │ │
│ │ │ │ │ │
│ │ └─────────────────────────────────┘ │ │
│ │ (反馈循环) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Answer (最终答案) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
# ReAct Agent实现
from typing import List, Dict, Any, Optional
import json
import re
class ReActAgent:
"""ReAct (Reasoning + Acting) Agent"""
def __init__(self, tools: Dict[str, callable], max_iterations: int = 10):
self.tools = tools
self.max_iterations = max_iterations
self.prompt_template = """你是一个能使用工具的AI助手。请按以下格式推理:
Thought: 分析当前情况,思考下一步应该做什么
Action: tool_name(arg1="value1", arg2="value2") 或 FINAL_ANSWER
Observation: 工具返回的结果(由系统填入)
可用工具:
{tool_descriptions}
问题: {question}
请开始推理:
"""
def _format_tools(self) -> str:
"""格式化工具描述"""
desc = []
for name, func in self.tools.items():
desc.append(f"- {name}: {func.__doc__ or '无描述'}")
return "\n".join(desc)
def _parse_action(self, action_str: str) -> Optional[tuple]:
"""解析Action字符串"""
# 匹配 tool_name(args) 格式
pattern = r'(\w+)\((.*)\)'
match = re.match(pattern, action_str.strip())
if not match:
return None
tool_name = match.group(1)
args_str = match.group(2)
# 简单的参数解析
args = {}
if args_str.strip():
# 解析 key="value" 格式
kv_pattern = r'(\w+)="([^"]*)"'
for m in re.finditer(kv_pattern, args_str):
args[m.group(1)] = m.group(2)
return tool_name, args
def run(self, question: str, llm_func: callable) -> str:
"""运行ReAct Agent"""
# 初始化
context = []
prompt = self.prompt_template.format(
tool_descriptions=self._format_tools(),
question=question
)
for iteration in range(self.max_iterations):
# 调用LLM获取下一步行动
messages = [{"role": "user", "content": prompt + "\n".join(context)}]
response = llm_func(messages) # 实际调用LLM
# 解析响应,提取Thought、Action、Observation
# 简化版:模拟LLM返回
lines = response.split('\n')
thought = ""
action = ""
observation = ""
for line in lines:
if line.startswith("Thought:"):
thought = line.replace("Thought:", "").strip()
elif line.startswith("Action:"):
action = line.replace("Action:", "").strip()
elif line.startswith("Observation:"):
observation = line.replace("Observation:", "").strip()
# 记录推理过程
context.append(f"Thought: {thought}")
# 检查是否完成
if action.startswith("FINAL_ANSWER"):
answer = action.replace("FINAL_ANSWER:", "").strip()
return answer
# 执行工具
if action:
tool_info = self._parse_action(action)
if tool_info:
tool_name, args = tool_info
if tool_name in self.tools:
try:
result = self.tools[tool_name](**args)
observation = str(result)
except Exception as e:
observation = f"错误: {str(e)}"
context.append(f"Action: {action}")
context.append(f"Observation: {observation}")
return "达到最大迭代次数"
def run_simulation(self, question: str) -> str:
"""模拟ReAct执行过程(用于演示)"""
print(f"\n{'='*60}")
print(f"问题: {question}")
print(f"{'='*60}\n")
# 模拟推理步骤
steps = [
("Thought", "用户想知道今天的日期和天气,我需要先获取日期,然后查询天气。"),
("Action", "get_date()"),
("Observation", "2024-01-15"),
("Thought", "现在我知道日期是2024-01-15,接下来查询北京天气。"),
("Action", "get_weather(city=\"北京\", date=\"2024-01-15\")"),
("Observation", "{\"weather\": \"晴\", \"temp\": \"5°C\", \"wind\": \"北风3-4级\"}"),
("Thought", "我已经获取到天气信息,可以给出最终答案了。"),
("Action", "FINAL_ANSWER: 今天是2024年1月15日,北京天气晴,气温5°C,体感较冷,建议注意保暖。")
]
for step_type, content in steps:
print(f"{step_type}: {content}")
if step_type == "Action" and "FINAL_ANSWER" in content:
break
return "2024年1月15日,北京天气晴,气温5°C"
# 定义工具
def get_current_date() -> str:
"""获取当前日期"""
from datetime import datetime
return datetime.now().strftime("%Y-%m-%d")
def get_weather(city: str, date: Optional[str] = None) -> dict:
"""获取城市天气"""
return {"weather": "晴", "temp": "25°C", "humidity": "45%"}
def search_web(query: str) -> list:
"""搜索网页"""
return [f"结果{i}: 关于{query}的信息" for i in range(1, 4)]
def calculator(expression: str) -> Any:
"""计算器"""
return eval(expression)
# 创建Agent并运行
tools = {
"get_date": get_current_date,
"get_weather": get_weather,
"search_web": search_web,
"calculator": calculator
}
agent = ReActAgent(tools)
agent.run_simulation("今天几号?北京天气怎么样?")
3.2 Plan-and-Execute vs Step-by-Step¶
┌─────────────────────────────────────────────────────────────────────┐
│ Plan-and-Execute vs Step-by-Step │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Plan-and-Execute (计划后执行) Step-by-Step (逐步执行) │
│ ┌─────────────────────────┐ ┌─────────────────────────┐ │
│ │ 1. 理解任务 │ │ 1. 理解任务 │ │
│ │ 2. 生成计划 │ │ 2. 执行第一步 │ │
│ │ (分解为子任务) │ │ 3. 评估结果 │ │
│ │ 3. 顺序执行计划 │ │ 4. 决定下一步 │ │
│ │ 4. 整合结果 │ │ 5. 循环直到完成 │ │
│ └─────────────────────────┘ └─────────────────────────┘ │
│ │
│ 优点: 优点: │
│ ✓ 全局视角,计划清晰 ✓ 灵活适应中途变化 │
│ ✓ 便于优化执行顺序 ✓ 每步可调整策略 │
│ ✓ 适合复杂多步骤任务 ✓ 实时反馈 │
│ │
│ 缺点: 缺点: │
│ ✗ 计划可能出错 ✗ 缺乏全局规划 │
│ ✗ 中途变化需要重新规划 ✗ 可能陷入局部最优 │
│ ✗ 计划生成有额外开销 ✗ 难以回溯 │
│ │
│ 适用场景: 适用场景: │
│ - 复杂项目规划 - 简单对话问答 │
│ - 多步骤流程 - 实时交互任务 │
│ - 需要全局优化 - 不确定性的任务 │
│ │
└─────────────────────────────────────────────────────────────────────┘
# Plan-and-Execute Agent实现
class PlanExecuteAgent:
"""计划后执行Agent"""
def __init__(self, tools: Dict[str, callable]):
self.tools = tools
def create_plan(self, task: str, llm_func: callable) -> List[Dict]:
"""创建执行计划"""
# 实际调用LLM生成计划
plan_prompt = f"""将以下任务分解为具体步骤:
任务: {task}
请以JSON数组格式返回步骤,每个步骤包含:
- step_id: 步骤编号
- description: 步骤描述
- tool: 需要使用的工具(如果没有则为空)
- dependencies: 依赖的前置步骤
返回JSON格式:"""
# 模拟返回
return [
{"step_id": 1, "description": "查询股票代码", "tool": "search_stock", "dependencies": []},
{"step_id": 2, "description": "获取股票历史价格", "tool": "get_stock_price", "dependencies": [1]},
{"step_id": 3, "description": "计算收益率", "tool": "calculate_return", "dependencies": [2]},
{"step_id": 4, "description": "生成分析报告", "tool": None, "dependencies": [3]}
]
def execute_plan(self, plan: List[Dict]) -> Dict:
"""执行计划"""
results = {}
for step in plan:
# 检查依赖是否满足
deps = step.get("dependencies", [])
if deps and not all(d in results for d in deps):
return {"error": f"步骤{step['step_id']}的依赖未满足"}
# 执行步骤
tool_name = step.get("tool")
if tool_name and tool_name in self.tools:
# 实际执行工具
# result = self.tools[tool_name]()
result = f"执行{tool_name}的结果"
else:
result = f"完成步骤{step['step_id']}: {step['description']}"
results[step["step_id"]] = result
print(f"步骤{step['step_id']}: {step['description']} -> {result}")
return results
# Step-by-Step Agent实现
class StepByStepAgent:
"""逐步执行Agent"""
def __init__(self, tools: Dict[str, callable]):
self.tools = tools
self.context = []
def run(self, task: str, max_steps: int = 10) -> str:
"""逐步执行"""
self.context.append(f"任务: {task}")
for step in range(max_steps):
# 决定下一步
# 实际调用LLM决定
next_action = self._decide_next_action(task)
if not next_action:
break
# 执行
result = self._execute_action(next_action)
self.context.append(f"结果: {result}")
# 检查是否完成
if self._is_complete(task, result):
return self._generate_final_answer(result)
return "任务未完成"
def _decide_next_action(self, task: str) -> Optional[Dict]:
"""决定下一步行动"""
# 简化模拟
return {"action": "search", "query": "相关信息"}
def _execute_action(self, action: Dict) -> Any:
"""执行行动"""
return "执行结果"
def _is_complete(self, task: str, result: Any) -> bool:
"""检查是否完成"""
return False
def _generate_final_answer(self, result: Any) -> str:
"""生成最终答案"""
return "最终答案"
# 混合策略
class HybridAgent:
"""混合Agent:结合Plan-and-Execute与Step-by-Step"""
def __init__(self, tools: Dict[str, callable]):
self.tools = tools
self.plan = []
self.current_step = 0
def run(self, task: str) -> str:
"""混合策略执行"""
print(f"任务: {task}\n")
# Phase 1: 快速规划
print("=== Phase 1: 规划 ===")
self._quick_plan(task)
# Phase 2: 逐步执行 + 动态调整
print("\n=== Phase 2: 执行 ===")
return self._execute_with_adaptation()
def _quick_plan(self, task: str):
"""快速生成计划"""
# 简化版
self.plan = [
{"id": 1, "action": "search", "desc": "搜索信息"},
{"id": 2, "action": "analyze", "desc": "分析数据"},
{"id": 3, "action": "report", "desc": "生成报告"}
]
for step in self.plan:
print(f" 步骤{step['id']}: {step['desc']}")
def _execute_with_adaptation(self) -> str:
"""带适应性调整的执行"""
results = []
for step in self.plan:
self.current_step = step["id"]
print(f"\n执行步骤{step['id']}: {step['desc']}")
# 模拟执行
result = f"步骤{step['id']}完成"
results.append(result)
# 动态检查 - 可以在这里调整后续计划
# if self._need_replan(results):
# self._adjust_plan()
return f"任务完成,结果: {results}"
3.3 Reflexion自我反思机制¶
┌─────────────────────────────────────────────────────────────────────┐
│ Reflexion工作流程 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ 执行任务 │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ 评估结果 │──── 成功 ──→ 输出结果 │
│ └──────┬───────┘ │
│ │ 失败 │
│ ▼ │
│ ┌──────────────┐ │
│ │ 自我反思 │ │
│ │ - 哪一步错? │ │
│ │ - 原因分析 │ │
│ │ - 改进策略 │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────┐ │
│ │ 重新执行 │ │
│ │ (使用新策略) │ │
│ └──────────────┘ │
│ │ │
│ └───────────────────────┐ │
│ │ │
│ ◄────────────────────────┘ │
│ (循环直到成功或达到上限) │
│ │
└─────────────────────────────────────────────────────────────────────┘
# Reflexion Agent实现
from typing import List, Tuple, Optional
import json
class ReflexionAgent:
"""带自我反思的Agent"""
def __init__(self, tools: Dict[str, callable], max_retries: int = 3):
self.tools = tools
self.max_retries = max_retries
self.memory = []
def run(self, task: str) -> str:
"""运行带反思的Agent"""
print(f"\n{'='*60}")
print(f"任务: {task}")
print(f"{'='*60}")
for attempt in range(self.max_retries):
print(f"\n--- 尝试 {attempt + 1}/{self.max_retries} ---")
# 1. 思考和规划
plan = self._think(task)
print(f"\n思考: {plan['thought']}")
# 2. 执行
result = self._execute(plan)
print(f"执行结果: {result}")
# 3. 评估
evaluation = self._evaluate(task, result)
print(f"评估: {evaluation['verdict']}")
# 4. 记录到记忆
self._add_to_memory(task, plan, result, evaluation)
# 5. 成功则返回
if evaluation['verdict'] == "成功":
print(f"\n✓ 任务完成!")
return result
# 6. 失败则反思
if attempt < self.max_retries - 1:
reflection = self._reflect(task, plan, result, evaluation)
print(f"\n反思: {reflection['analysis']}")
print(f"改进策略: {reflection['improvement']}")
return "任务失败,已达到最大重试次数"
def _think(self, task: str) -> dict:
"""思考阶段:分析任务并制定计划"""
# 实际应用中调用LLM
# 简化模拟
return {
"thought": f"分析任务: {task}",
"action": "search_and_answer",
"params": {"query": task}
}
def _execute(self, plan: dict) -> str:
"""执行计划"""
action = plan.get("action")
if action in self.tools:
# 实际执行
return str(self.tools[action](**plan.get("params", {})))
# 模拟执行
return "执行完成"
(self, task: str, result: def _evaluate str) -> dict:
"""评估执行结果"""
# 实际应用中可能调用LLM judge或使用规则
# 简化模拟:根据结果判断
success_indicators = ["完成", "成功", "结果", "答案"]
if any(indicator in result for indicator in success_indicators):
return {"verdict": "成功", "score": 0.9}
return {"verdict": "失败", "reason": "结果不完整"}
def _reflect(self, task: str, plan: dict, result: str, evaluation: dict) -> dict:
"""反思阶段:分析失败原因并制定改进策略"""
# 实际应用中调用LLM进行深度反思
return {
"analysis": f"失败原因: {evaluation.get('reason', '未知')}",
"improvement": "调整搜索策略,增加更多相关信息源"
}
def _add_to_memory(self, task: str, plan: dict, result: str, evaluation: dict):
"""添加到记忆"""
self.memory.append({
"task": task,
"plan": plan,
"result": result,
"evaluation": evaluation
})
def get_memory(self) -> List[dict]:
"""获取记忆"""
return self.memory
# 完整示例:带反思的代码调试Agent
class DebuggingReflexionAgent:
"""带自我反思的代码调试Agent"""
def __init__(self):
self.attempts = []
def debug(self, code: str, error: str) -> str:
"""调试代码"""
print(f"待调试代码:\n{code}\n")
print(f"错误信息: {error}\n")
# 尝试不同的修复策略
strategies = [
self._fix_syntax_error,
self._fix_logic_error,
self._fix_runtime_error,
self._ask_for_help
]
for i, strategy in enumerate(strategies):
print(f"\n尝试策略 {i+1}: {strategy.__name__}")
fixed_code = strategy(code, error)
success, output = self._test_fix(fixed_code)
self.attempts.append({
"strategy": strategy.__name__,
"fixed_code": fixed_code,
"success": success,
"output": output
})
if success:
print(f"✓ 修复成功!")
return fixed_code
return "无法自动修复,建议手动检查"
def _fix_syntax_error(self, code: str, error: str) -> str:
"""修复语法错误"""
# 简化示例:处理缺少括号
if "unexpected EOF" in error:
# 尝试添加缺失的括号
return code + "\n)"
return code
def _fix_logic_error(self, code: str, error: str) -> str:
"""修复逻辑错误"""
# 简化示例:修复变量名错误
return code.replace("varialbe", "variable")
def _fix_runtime_error(self, code: str, error: str) -> str:
"""修复运行时错误"""
# 简化示例:添加try-except
return "try:\n " + code.replace("\n", "\n ") + "\nexcept Exception as e:\n print(e)"
def _ask_for_help(self, code: str, error: str) -> str:
"""请求帮助"""
# 可以调用LLM解释错误并给出修复建议
return "# 请手动检查代码错误"
def _test_fix(self, code: str) -> Tuple[bool, str]:
"""测试修复"""
# 实际应用中执行代码
# 这里简单模拟
return True, "测试通过"
# 运行演示
print("\n" + "="*60)
print("Reflexion Agent 演示")
print("="*60)
# 创建Agent
reflexion_agent = ReflexionAgent(
tools={"search_and_answer": lambda q: f"关于{q}的答案"}
)
# 运行
result = reflexion_agent.run("解释什么是机器学习")
print(f"\n最终结果: {result}")
# 打印记忆
print("\n--- Agent记忆 ---")
for mem in reflexion_agent.get_memory():
print(f"任务: {mem['task']}, 评估: {mem['evaluation']['verdict']}")
📖 4. 工具生态系统¶
4.1 MCP (Model Context Protocol)¶
┌─────────────────────────────────────────────────────────────────────┐
│ MCP生态系统架构 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ LLM / AI应用 │ │
│ │ (Claude, GPT等) │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ │ MCP协议 │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ MCP Client (客户端) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ 资源管理 │ │ 工具调用 │ │ 提示模板 │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └──────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ │ MCP协议 │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ MCP Server (服务端) │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ │
│ │ │ 文件系统 │ │ 数据库 │ │ API服务 │ │ │
│ │ └─────────────┘ └─────────────┘ └─────────────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ 工具生态 │ │
│ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │
│ │ │搜索 │ │代码 │ │数据库│ │文件 │ │Slack │ │ │
│ │ │工具 │ │解释器│ │工具 │ │系统 │ │工具 │ │ │
│ │ └──────┘ └──────┘ └──────┘ └──────┘ └──────┘ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
# MCP协议核心概念
# 1. MCP工具定义
MCP_TOOL_SCHEMA = {
"name": "mcp-tool-example",
"description": "MCP工具描述示例",
"inputSchema": {
"type": "object",
"properties": {
"param1": {
"type": "string",
"description": "参数描述"
}
},
"required": ["param1"]
}
}
# 2. MCP资源定义
MCP_RESOURCE = {
"uri": "file:///example/data.json",
"name": "example-data",
"description": "示例数据文件",
"mimeType": "application/json"
}
# 3. MCP提示模板
MCP_PROMPT = {
"name": "summarize-document",
"description": "文档摘要提示",
"arguments": [
{
"name": "document_path",
"description": "文档路径",
"required": True
}
]
}
# MCP Client实现示例
import asyncio
class MCPClient:
"""MCP客户端"""
def __init__(self, server_url: str):
self.server_url = server_url
self.tools = {}
self.resources = {}
self.prompts = {}
async def initialize(self):
"""初始化连接"""
# 发送initialize请求
# 获取服务器能力
pass
async def list_tools(self) -> List[dict]:
"""列出可用工具"""
# 发送tools/list请求
return []
async def call_tool(self, tool_name: str, arguments: dict) -> Any:
"""调用工具"""
# 发送tools/call请求
pass
async def read_resource(self, uri: str) -> Any:
"""读取资源"""
# 发送resources/read请求
pass
async def get_prompt(self, prompt_name: str, arguments: dict) -> str:
"""获取提示"""
# 发送prompts/get请求
pass
# MCP Server实现示例
class MCPServer:
"""MCP服务端"""
def __init__(self):
self.tools = {}
self.resources = {}
self.prompts = {}
def register_tool(self, name: str, handler: callable, schema: dict):
"""注册工具"""
self.tools[name] = {
"handler": handler,
"schema": schema
}
def register_resource(self, uri: str, data: Any, description: str = ""):
"""注册资源"""
self.resources[uri] = {
"data": data,
"description": description
}
async def handle_request(self, method: str, params: dict) -> Any:
"""处理MCP请求"""
if method == "tools/list":
return self._list_tools()
elif method == "tools/call":
return await self._call_tool(params["name"], params.get("arguments", {}))
elif method == "resources/list":
return self._list_resources()
elif method == "resources/read":
return self._read_resource(params["uri"])
elif method == "prompts/list":
return self._list_prompts()
elif method == "prompts/get":
return self._get_prompt(params["name"], params.get("arguments", {}))
raise ValueError(f"未知方法: {method}")
def _list_tools(self):
return {"tools": [
{"name": name, **info["schema"]}
for name, info in self.tools.items()
]}
async def _call_tool(self, name: str, arguments: dict) -> dict:
if name not in self.tools:
raise ValueError(f"未知工具: {name}")
handler = self.tools[name]["handler"]
result = await handler(**arguments)
return {"content": [{"type": "text", "text": str(result)}]}
def _list_resources(self):
return {"resources": [
{"uri": uri, "description": info["description"]}
for uri, info in self.resources.items()
]}
def _read_resource(self, uri: str) -> dict:
if uri not in self.resources:
raise ValueError(f"未知资源: {uri}")
return {"contents": [{"uri": uri, "text": str(self.resources[uri]["data"])}]}
def _list_prompts(self):
return {"prompts": list(self.prompts.keys())}
def _get_prompt(self, name: str, arguments: dict) -> dict:
if name not in self.prompts:
raise ValueError(f"未知提示: {name}")
template = self.prompts[name]
# 格式化模板
return {"messages": [{"role": "user", "content": {"type": "text", "text": template}}]}
# 实际使用示例
async def main():
# 创建MCP服务器
server = MCPServer()
# 注册工具
def search_web(query: str):
return f"搜索结果: {query}"
def calculator(expression: str):
return eval(expression)
server.register_tool("search", search_web, {
"description": "搜索网页",
"inputSchema": {
"type": "object",
"properties": {"query": {"type": "string"}},
"required": ["query"]
}
})
server.register_tool("calculate", calculator, {
"description": "计算器",
"inputSchema": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"]
}
})
# 注册资源
server.register_resource("file:///config.json", '{"debug": true}', "配置文件")
# 处理请求
# 列出工具
result = await server.handle_request("tools/list", {})
print("可用工具:", result)
# 调用工具
result = await server.handle_request("tools/call", {
"name": "search",
"arguments": {"query": "Python"}
})
print("工具调用结果:", result)
# 运行
asyncio.run(main())
4.2 主流工具API集成¶
# 主流工具API集成示例
# 1. 搜索工具
class SearchTools:
"""搜索工具集成"""
@staticmethod
def search_google(query: str, api_key: str = None):
"""Google搜索"""
# 实际调用
# response = requests.get(
# "https://www.googleapis.com/customsearch/v1",
# params={"key": api_key, "q": query}
# )
return {"results": ["结果1", "results2"]}
@staticmethod
def search_baidu(query: str):
"""百度搜索"""
return {"results": ["百度结果1", "百度结果2"]}
# 2. 数据库工具
class DatabaseTools:
"""数据库工具"""
def __init__(self, connection_string: str):
self.conn_str = connection_string
def execute_query(self, sql: str) -> List[Dict]:
"""执行SQL查询"""
# 实际使用
# import sqlite3
# conn = sqlite3.connect(self.conn_str)
# cursor = conn.cursor()
# cursor.execute(sql)
# results = cursor.fetchall()
return [{"id": 1, "name": "示例"}]
def query_vector_db(self, collection: str, vector: List[float], top_k: int = 5):
"""向量数据库查询"""
# 实际使用Pinecone/Weaviate/Milvus
return [{"id": "doc1", "score": 0.95, "text": "相关文档内容"}]
# 3. 文件操作工具
class FileTools:
"""文件操作工具"""
@staticmethod
def read_file(path: str, encoding: str = "utf-8") -> str:
"""读取文件"""
with open(path, 'r', encoding=encoding) as f:
return f.read()
@staticmethod
def write_file(path: str, content: str, encoding: str = "utf-8"):
"""写入文件"""
with open(path, 'w', encoding=encoding) as f:
f.write(content)
@staticmethod
def list_files(directory: str, pattern: str = "*") -> List[str]:
"""列出文件"""
import glob
return glob.glob(f"{directory}/{pattern}")
@staticmethod
def read_pdf(path: str) -> str:
"""读取PDF"""
# 使用PyPDF2
# import PyPDF2
# with open(path, 'rb') as f:
# reader = PyPDF2.PdfReader(f)
# return "".join(page.extract_text() for page in reader.pages)
return "PDF内容..."
# 4. 通信工具
class CommunicationTools:
"""通信工具"""
def send_email(self, to: str, subject: str, body: str):
"""发送邮件"""
# 实际使用smtplib
# import smtplib
# from email.mime.text import MIMEText
pass
def send_slack_message(self, channel: str, message: str, webhook_url: str):
"""发送Slack消息"""
# 实际使用
# import requests
# requests.post(webhook_url, json={"text": message})
pass
def send_dingtalk_message(self, webhook: str, message: str):
"""发送钉钉消息"""
pass
# 5. 日历/日程工具
class CalendarTools:
"""日历工具"""
def create_event(self, title: str, start_time: str, end_time: str, description: str = ""):
"""创建日程"""
# 实际使用Google Calendar API或CalDAV
return {"event_id": "abc123", "status": "created"}
def get_events(self, start_date: str, end_date: str) -> List[Dict]:
"""获取日程"""
return [
{"title": "会议", "start": "2024-01-15 10:00", "end": "2024-01-15 11:00"}
]
4.3 代码解释器(Code Interpreter)¶
# 代码解释器实现
class CodeInterpreter:
"""安全的Python代码解释器"""
def __init__(self, timeout: int = 30, memory_limit: int = 100 * 1024 * 1024):
self.timeout = timeout
self.memory_limit = memory_limit
self.sandbox_globals = {
"__builtins__": {
# 只允许安全操作
"print": print,
"len": len,
"range": range,
"str": str,
"int": int,
"float": float,
"list": list,
"dict": dict,
"set": set,
"tuple": tuple,
"abs": abs,
"max": max,
"min": min,
"sum": sum,
"sorted": sorted,
"zip": zip,
"enumerate": enumerate,
"map": map,
"filter": filter,
"any": any,
"all": all,
# 限制数学运算
"round": round,
}
}
self.sandbox_locals = {}
def execute(self, code: str) -> dict:
"""执行代码"""
import sys
from io import StringIO
# 捕获输出
stdout = StringIO()
stderr = StringIO()
old_stdout = sys.stdout
old_stderr = sys.stderr
try:
sys.stdout = stdout
sys.stderr = stderr
# 执行代码(安全方式)
exec(code, self.sandbox_globals, self.sandbox_locals)
return {
"success": True,
"output": stdout.getvalue(),
"error": None,
"result": self.sandbox_locals.get("_result")
}
except Exception as e:
return {
"success": False,
"output": stdout.getvalue(),
"error": str(e),
"result": None
}
finally:
sys.stdout = old_stdout
sys.stderr = old_stderr
def execute_with_context(self, code: str, context: dict) -> dict:
"""带上下文的代码执行"""
# 注入上下文变量
self.sandbox_locals.update(context)
# 添加结果变量标记
code_with_result = code + "\n_result = None # 结果标记"
return self.execute(code_with_result)
# 基于沙箱的代码执行
class SandboxedExecutor:
"""沙箱执行器(使用Docker)"""
def __init__(self, image: str = "python:3.11-sandbox"):
self.image = image
# 实际使用docker-py创建容器
# import docker
# self.client = docker.from_env()
def execute(self, code: str, dependencies: List[str] = None) -> dict:
"""在沙箱中执行代码"""
# 1. 构建Dockerfile
dockerfile = f"""
FROM {self.image}
RUN pip install {' '.join(dependencies or [])}
WORKDIR /app
COPY - <<EOF main.py
{code}
EOF
CMD ["python", "main.py"]
"""
# 2. 构建并运行容器
# image = self.client.images.build(fileobj=dockerfile, tag="temp")
# container = self.client.containers.run(image, detach=True)
# 3. 获取结果
# logs = container.logs()
# container.remove()
# 模拟返回
return {
"success": True,
"output": "执行输出",
"execution_time": 1.5
}
# Jupyter kernel集成
class JupyterExecutor:
"""Jupyter Kernel执行器"""
def __init__(self, kernel_name: str = "python3"):
self.kernel_name = kernel_name
# 实际使用jupyter_client
# from jupyter_client import KernelClient
# self.client = KernelClient()
def execute_code(self, code: str) -> dict:
"""执行代码并获取结果"""
# 实际使用
# self.client.start_kernel()
# msg_id = self.client.execute(code)
# reply = self.client.get_shell_msg(msg_id)
# return self._parse_reply(reply)
return {
"status": "ok",
"execution_count": 1,
"outputs": []
}
def execute_magic(self, magic_command: str) -> dict:
"""执行Jupyter Magic命令"""
# %matplotlib inline
# %timeit ...
pass
# 代码解释器Agent应用
class CodeInterpreterAgent:
"""代码解释器Agent"""
def __init__(self):
self.interpreter = CodeInterpreter()
self.code_history = []
def run_task(self, task: str) -> str:
"""运行任务"""
print(f"任务: {task}\n")
# 1. 分析任务并生成代码
code = self._generate_code(task)
print(f"生成的代码:\n{code}\n")
# 2. 执行代码
result = self.interpreter.execute(code)
# 3. 处理结果
if result["success"]:
print(f"执行成功!")
print(f"输出: {result['output']}")
if result.get('result'):
print(f"结果: {result['result']}")
else:
print(f"执行失败!")
print(f"错误: {result['error']}")
# 尝试修复
fixed_code = self._fix_error(code, result["error"])
if fixed_code:
print(f"\n尝试修复:\n{fixed_code}")
result = self.interpreter.execute(fixed_code)
return str(result.get("output", result.get("error")))
def _generate_code(self, task: str) -> str:
"""生成代码(实际应用调用LLM)"""
# 简化示例
if "计算" in task or "总和" in task:
# 提取数字
import re
numbers = re.findall(r'\d+', task)
nums = [int(n) for n in numbers]
return f"numbers = {nums}\nresult = sum(numbers)\nprint(f'总和: {{result}}')"
return "# 请输入Python代码"
def _fix_error(self, code: str, error: str) -> str:
"""修复代码错误"""
# 实际应用调用LLM修复
if "SyntaxError" in error:
return code + " # 修复语法错误"
return None
# 使用示例
print("\n" + "="*60)
print("代码解释器演示")
print("="*60)
agent = CodeInterpreterAgent()
# 测试任务
tasks = [
"计算 1+2+3+4+5 的总和",
"计算 10 * 20 的结果",
]
for task in tasks:
agent.run_task(task)
print()
📖 5. 实际应用案例¶
5.1 智能助手Agent架构¶
# 完整的智能助手Agent
class IntelligentAssistant:
"""智能助手Agent"""
def __init__(self):
# 工具注册
self.tools = {
"search": self._search,
"calculator": self._calculator,
"weather": self._weather,
"news": self._news,
"code_execute": self._code_execute,
"send_email": self._send_email,
}
# Agent组件
self.memory = []
self.planner = PlanExecuteAgent(self.tools)
self.reflector = ReflexionAgent(self.tools)
def chat(self, user_input: str) -> str:
"""处理用户对话"""
# 1. 意图识别
intent = self._recognize_intent(user_input)
# 2. 选择策略
if intent == "complex_task":
# 复杂任务:计划+执行
return self.planner.run(user_input)
elif intent == "reflection_needed":
# 需要反思的任务
return self.reflector.run(user_input)
else:
# 简单任务:直接执行
return_execute(user_input)
def self._direct _recognize_intent(self, text: str) -> str:
"""识别意图"""
complex_keywords = ["分析", "比较", "研究", "报告"]
reflection_keywords = ["检查", "调试", "修复", "验证"]
if any(k in text for k in complex_keywords):
return "complex_task"
elif any(k in text for k in reflection_keywords):
return "reflection_needed"
return "simple"
def _direct_execute(self, task: str) -> str:
"""直接执行"""
# 解析任务并调用工具
return "处理结果"
# 工具实现
def _search(self, query: str): pass
def _calculator(self, expr: str): pass
def _weather(self, city: str): pass
def _news(self, topic: str): pass
def _code_execute(self, code: str): pass
def _send_email(self, to: str, subject: str, body: str): pass
🎯 面试高频题¶
Q1: 传统NLP和LLM对话的主要区别?¶
A: 传统NLP(任务型对话)使用pipeline架构(NLU→DST→Policy→NLG),每个模块职责明确,可控性强但泛化能力弱。LLM对话是端到端的,使用预训练+RLHF,具有强大的零样本能力,但执行路径不透明。两者各有适用场景:任务型适合单一领域、高可靠性要求;LLM适合开放域、多样化需求。
Q2: Function Calling是如何工作的?¶
A: Function Calling通过让LLM学习在合适时机输出特定格式的工具调用指令实现。流程:1)向LLM提供工具的JSON Schema描述;2)LLM分析用户问题,决定是否需要调用工具;3)如需调用,输出工具名和参数;4)系统解析并执行工具;5)将结果拼接回上下文;6)LLM生成最终回答。关键是训练时让模型学会判断何时用什么工具。
Q3: ReAct模式和普通Tool Use有什么区别?¶
A: 普通Tool Use是one-shot的——决定用工具→调用→得到结果→回答。ReAct是多步推理循环:Thought→Action→Observation→Thought...,可以链式调用多个工具。ReAct的优势是推理过程透明,每步都有"思考-行动-观察"的反馈,便于处理复杂问题。
Q4: Plan-and-Execute和Step-by-Step各适合什么场景?¶
A: Plan-and-Execute适合:复杂多步骤任务、需要全局优化的场景、任务可预先分解。Step-by-Step适合:简单交互任务、实时性要求高、任务过程中可能变化。混合策略是当前主流——先用Plan快速规划,执行中用Step-by-Step灵活调整。
Q5: MCP协议解决了什么问题?¶
A: MCP(Model Context Protocol)是Anthropic提出的AI工具交互标准,解决:1)每个AI应用需要单独对接各种工具的混乱局面;2)工具定义的标准化问题;3)AI与外部系统交互的统一接口。MCP让AI可以动态发现和使用各种工具,形成可扩展的工具生态。
Q6: 代码解释器如何保证安全性?¶
A: 代码解释器的安全措施:1)沙箱隔离(Docker/VM);2)超时限制防止无限循环;3)内存限制防止资源耗尽;4)限制可用的builtins(移除危险函数如os.system);5)网络隔离;6)执行结果审查。当前主流方案是使用专门的沙箱环境如E2B、Browserbase等。
✅ 学习检查清单¶
- 能说清楚NLP发展的四个阶段及特点
- 能对比任务型对话系统和LLM对话的优缺点
- 能实现完整的Function Calling流程
- 理解ReAct推理模式并能实现
- 能区分Plan-and-Execute和Step-by-Step的使用场景
- 能实现带自我反思的Agent
- 理解MCP协议的核心概念和优势
- 能集成主流工具API(搜索、数据库、文件等)
- 了解代码解释器的安全机制
📌 下一步:学习 LLM应用/07-Agent开发基础 深入Agent开发,或继续探索 AI Agent开发实战/ 目录了解实际项目。