结构化输出与Function Calling¶
⚠️ 时效性说明:本章涉及前沿模型/价格/榜单等信息,可能随版本快速变化;请以论文原文、官方发布页和 API 文档为准。
学习时间: 6-8小时 难度级别: ⭐⭐⭐ 中级 前置知识: Prompt工程(第2章)、Agent开发基础(第7章) 学习目标: 掌握JSON Mode、结构化输出、Function Calling与Tool Use的原理与工程实践
📖 章节导读¶
结构化输出和Function Calling是大模型从"聊天工具"进化为"可编程引擎"的关键能力。掌握这些技术,能让LLM可靠地产出程序可解析的数据,并与外部系统进行精确交互。
1. 结构化输出(Structured Output)¶
1.1 为什么需要结构化输出¶
传统LLM输出自由文本,导致以下问题: - 解析不可靠:正则/字符串匹配容易失败 - 类型不确定:数字可能返回为字符串 - 格式不一致:同一Prompt多次调用返回格式不同
结构化输出通过约束解码保证输出严格遵循指定Schema。
1.2 JSON Mode vs Structured Output¶
| 特性 | JSON Mode | Structured Output |
|---|---|---|
| 保证合法JSON | ✅ | ✅ |
| 保证符合Schema | ❌ | ✅ |
| 字段名保证 | ❌ | ✅ |
| 类型保证 | ❌ | ✅ |
| 支持嵌套对象 | 部分 | ✅ |
| 主要提供商 | OpenAI/Anthropic/Gemini | OpenAI(2024.08+) |
1.3 OpenAI Structured Output实战¶
from openai import OpenAI
from pydantic import BaseModel
client = OpenAI()
# 定义输出Schema(使用Pydantic)
class ResearchPaper(BaseModel): # Pydantic BaseModel:自动数据验证和序列化
title: str
authors: list[str]
year: int
abstract_summary: str
key_contributions: list[str]
methodology: str
dataset: str | None = None
metrics: dict[str, float]
# 使用Structured Output
response = client.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "你是论文分析专家,请提取论文的结构化信息。"},
{"role": "user", "content": """
分析这篇论文:
Attention Is All You Need (2017)
By Vaswani et al.
提出了Transformer架构...
"""}
],
response_format=ResearchPaper, # 直接传入Pydantic模型
)
paper = response.choices[0].message.parsed
print(f"标题: {paper.title}")
print(f"年份: {paper.year}")
print(f"贡献: {paper.key_contributions}")
1.4 Anthropic/Gemini的JSON模式¶
# Anthropic Claude - 使用tool_use技巧实现结构化输出
import anthropic
client = anthropic.Anthropic()
# Claude通过Tool定义实现结构化输出
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[{
"name": "extract_info",
"description": "提取结构化信息",
"input_schema": {
"type": "object",
"properties": {
"sentiment": {"type": "string", "enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1},
"keywords": {"type": "array", "items": {"type": "string"}}
},
"required": ["sentiment", "confidence", "keywords"]
}
}],
tool_choice={"type": "tool", "name": "extract_info"},
messages=[{"role": "user", "content": "分析这段评论的情感: '这个产品太棒了!'"}]
)
1.5 受限解码原理¶
结构化输出的底层机制:
常规生成: logits → softmax → 采样(所有token)
受限解码: logits → mask非法token → softmax → 采样(仅合法token)
示例 - 生成JSON {"name": 后:
合法token: [a-z, A-Z, 0-9, ", ', 空格...]
非法token: [}, ], 换行, EOF...] ← 被mask为-inf
框架实现: - Outlines: 基于正则表达式的约束采样 - Guidance: 微软的受限生成框架 - Instructor: 基于Pydantic的LLM结构化输出库
2. Function Calling / Tool Use¶
2.1 核心概念¶
Function Calling让LLM能够: 1. 识别意图:判断用户请求是否需要调用外部函数 2. 生成参数:根据Schema生成正确的函数调用参数 3. 多轮调用:支持串行/并行多函数调用
用户: "北京今天天气怎么样?"
↓ LLM判断需要调用天气API
模型: function_call: get_weather(city="北京", date="2025-07-07")
↓ 应用层执行函数
结果: {"temp": 32, "condition": "晴"}
↓ 将结果返回给LLM
模型: "北京今天32°C,晴天。"
2.2 OpenAI Function Calling实战¶
import json
from openai import OpenAI
client = OpenAI()
# 定义工具
tools = [
{
"type": "function",
"function": {
"name": "search_papers",
"description": "搜索学术论文",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索关键词"
},
"year_from": {
"type": "integer",
"description": "起始年份"
},
"max_results": {
"type": "integer",
"description": "最大结果数",
"default": 10
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "calculate_metrics",
"description": "计算模型评估指标",
"parameters": {
"type": "object",
"properties": {
"predictions": {
"type": "array",
"items": {"type": "number"},
"description": "模型预测值"
},
"ground_truth": {
"type": "array",
"items": {"type": "number"},
"description": "真实标签"
},
"metric": {
"type": "string",
"enum": ["accuracy", "f1", "precision", "recall"],
"description": "评估指标类型"
}
},
"required": ["predictions", "ground_truth", "metric"]
}
}
}
]
# 对话循环(含工具调用处理)
messages = [
{"role": "system", "content": "你是AI研究助手,可以搜索论文和计算指标。"},
{"role": "user", "content": "帮我找2023年以来关于RLHF的论文"}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
# 处理工具调用
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments) # json.loads将JSON字符串→Python对象
# 实际执行函数
if func_name == "search_papers":
result = search_papers(**func_args) # 你的实现
elif func_name == "calculate_metrics":
result = calculate_metrics(**func_args)
# 将结果返回给LLM
messages.append(message)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result, ensure_ascii=False) # json.dumps将Python对象→JSON字符串
})
# 获取最终回复
final_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools
)
print(final_response.choices[0].message.content)
2.3 并行函数调用(Parallel Tool Calls)¶
# GPT-4o支持一次返回多个tool_calls
# 用户: "查北京和上海的天气"
# 模型会同时返回:
# tool_call_1: get_weather(city="北京")
# tool_call_2: get_weather(city="上海")
# 处理并行调用
if message.tool_calls:
messages.append(message)
# 并行执行所有函数(可用asyncio加速)
import asyncio # Python标准异步库
async def execute_tools(tool_calls): # async def定义协程函数
tasks = []
for tc in tool_calls:
args = json.loads(tc.function.arguments)
tasks.append(execute_function(tc.function.name, args))
return await asyncio.gather(*tasks) # 并发执行多个协程任务
results = asyncio.run(execute_tools(message.tool_calls)) # 创建事件循环运行顶层协程
for tool_call, result in zip(message.tool_calls, results): # zip按位置配对多个可迭代对象
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result, ensure_ascii=False)
})
2.4 Anthropic Tool Use¶
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_stock_price",
"description": "获取股票实时价格",
"input_schema": {
"type": "object",
"properties": {
"symbol": {"type": "string", "description": "股票代码"},
"market": {"type": "string", "enum": ["US", "CN", "HK"]}
},
"required": ["symbol"]
}
}
],
messages=[{"role": "user", "content": "查一下苹果股票价格"}]
)
# Claude通过content blocks返回tool_use
for block in response.content:
if block.type == "tool_use":
print(f"调用: {block.name}({block.input})")
3. OpenAI Agents SDK (2025)¶
3.1 SDK概述¶
OpenAI在2025年3月发布的Agents SDK(原Swarm的生产版),提供: - Agent定义:声明式Agent + 工具 + 指令 - Handoff机制:Agent间任务委托 - Guardrails:输入/输出安全护栏 - Tracing:内置可观测性
3.2 基础Agent定义¶
from agents import Agent, Runner, function_tool
# 定义工具
@function_tool
def search_database(query: str, limit: int = 5) -> str:
"""搜索知识库"""
# 实际实现
results = db.search(query, limit=limit)
return json.dumps(results)
@function_tool
def send_email(to: str, subject: str, body: str) -> str:
"""发送邮件"""
# 实际实现
return f"邮件已发送至 {to}"
# 定义Agent
research_agent = Agent(
name="研究助手",
instructions="""你是一个AI研究助手。
- 用户询问学术问题时,先搜索知识库
- 找到结果后整理成易读的格式
- 如果用户要求发送摘要,使用邮件工具""",
tools=[search_database, send_email],
model="gpt-4o"
)
# 运行Agent
result = Runner.run_sync(
research_agent,
messages=[{"role": "user", "content": "帮我搜索最新的GRPO论文"}]
)
print(result.final_output)
3.3 多Agent Handoff¶
from agents import Agent, Runner
# 注意:handoffs 必须传入 Agent 对象引用,不能用字符串
# 因此需要先定义被转交的 Agent,再定义分诊 Agent
# 代码专家
code_agent = Agent(
name="代码专家",
instructions="你是编程专家,帮助用户编写和调试代码。",
tools=[run_code, search_docs]
)
# 论文专家
paper_agent = Agent(
name="论文专家",
instructions="你是论文阅读专家,帮助用户理解和总结论文。",
tools=[search_papers, extract_info]
)
# 部署专家
deploy_agent = Agent(
name="部署专家",
instructions="你是MLOps专家,帮助用户部署和优化模型服务。",
tools=[check_gpu, deploy_model]
)
# 分诊Agent(必须在目标Agent定义之后创建)
triage_agent = Agent(
name="分诊助手",
instructions="根据用户问题类型,转交给对应的专家Agent。",
handoffs=[code_agent, paper_agent, deploy_agent] # Agent对象,非字符串
)
# 运行多Agent系统
result = Runner.run_sync(
triage_agent,
messages=[{"role": "user", "content": "帮我优化这段CUDA代码的推理速度"}]
)
# triage_agent → 判断是代码问题 → handoff到code_agent → 返回结果
3.4 Guardrails(安全护栏)¶
from agents import Agent, InputGuardrail, OutputGuardrail, GuardrailFunctionOutput
# 输入护栏:检查是否为合法请求
@InputGuardrail
async def check_safety(ctx, agent, input_data):
# 使用另一个LLM进行安全检查
result = await Runner.run( # await等待异步操作完成
safety_checker_agent,
messages=input_data
)
is_safe = "safe" in result.final_output.lower()
return GuardrailFunctionOutput(
output_info={"safe": is_safe},
tripwire_triggered=not is_safe
)
# 输出护栏:确保不泄露敏感信息
@OutputGuardrail
async def check_pii(ctx, agent, output):
has_pii = detect_pii(output) # 自定义PII检测
return GuardrailFunctionOutput(
output_info={"has_pii": has_pii},
tripwire_triggered=has_pii
)
secure_agent = Agent(
name="安全Agent",
instructions="...",
input_guardrails=[check_safety],
output_guardrails=[check_pii]
)
4. 实战:构建结构化数据提取Pipeline¶
4.1 场景:简历信息提取¶
from pydantic import BaseModel, Field
from openai import OpenAI
class Education(BaseModel):
school: str = Field(description="学校名称")
degree: str = Field(description="学位: 本科/硕士/博士")
major: str = Field(description="专业")
gpa: float | None = Field(None, description="GPA")
year: str = Field(description="毕业年份")
class WorkExperience(BaseModel):
company: str
title: str
duration: str
highlights: list[str] = Field(description="工作亮点,3-5条")
tech_stack: list[str] = Field(description="使用的技术栈")
class ResumeInfo(BaseModel):
name: str
email: str | None = None
phone: str | None = None
education: list[Education]
experience: list[WorkExperience]
skills: list[str]
publications: list[str] | None = None
target_position: str | None = Field(None, description="目标岗位")
def extract_resume(text: str) -> ResumeInfo:
client = OpenAI()
response = client.chat.completions.parse(
model="gpt-4o",
messages=[
{"role": "system", "content": "你是简历解析专家,准确提取简历中的所有结构化信息。"},
{"role": "user", "content": f"请提取以下简历信息:\n\n{text}"}
],
response_format=ResumeInfo,
)
return response.choices[0].message.parsed
# 使用
resume = extract_resume(resume_text)
print(f"候选人: {resume.name}")
print(f"技能: {resume.skills}")
for exp in resume.experience:
print(f" {exp.company} - {exp.title}: {exp.highlights}")
5. 面试高频题¶
Q1: JSON Mode和Structured Output有什么区别?¶
答:JSON Mode只保证输出是合法JSON,但不保证符合特定Schema;Structured Output通过受限解码(Constrained Decoding),在token生成时mask非法token,保证100%符合指定的JSON Schema,包括字段名、类型、嵌套结构等。
Q2: Function Calling的工作原理?¶
答:模型在训练时被微调以理解函数定义Schema。推理时,模型根据用户意图决定是否调用函数、选择哪个函数、生成参数JSON。模型本身不执行函数,而是返回tool_calls给应用层,应用层执行后将结果以tool角色消息返回,模型再基于结果生成最终回复。
Q3: 如何处理Function Calling的错误?¶
答: 1. 参数验证:在执行前用Pydantic/JSON Schema验证参数 2. 错误返回:将错误信息通过tool message返回给LLM,让它自行修正 3. 重试机制:设置最大重试次数,避免无限循环 4. 降级策略:工具不可用时退回到纯文本回答
Q4: OpenAI Agents SDK vs LangChain/LangGraph?¶
答:Agents SDK更轻量,内置Handoff、Guardrails、Tracing;LangChain生态更丰富,支持更多LLM/工具/向量DB。Agents SDK适合OpenAI生态内的快速开发,LangChain/LangGraph适合多provider复杂编排。
📚 参考资源¶
- OpenAI Structured Outputs文档
- OpenAI Function Calling文档
- OpenAI Agents SDK GitHub
- Anthropic Tool Use文档
- Instructor库 - Pydantic驱动的结构化输出
- Outlines - 受限解码框架
📎 交叉引用: - Prompt工程基础 → LLM应用/Prompt工程 - Agent开发框架 → AI Agent开发实战 - MCP工具生态 → AI Agent开发实战/MCP与工具生态
最后更新日期:2026-02-12 适用版本:LLM应用指南 v2026