第八章 Context Engineering(上下文工程)¶
⚠️ 时效性说明:本章涉及前沿模型/价格/榜单等信息,可能随版本快速变化;请以论文原文、官方发布页和 API 文档为准。
📌 定位说明:本章是2025年Agent领域最热话题之一。Context Engineering(上下文工程)由Andrej Karpathy、Shopify CEO Tobi Lütke等推动,被认为是"Prompt Engineering的进化版"——当Agent需要长时间自主运行、系统性完成复杂任务时,如何构建、管理和优化上下文窗口中的信息,成为了Agent能力的核心瓶颈。
📖 本章概览¶
| 主题 | 内容 | 预计学时 |
|---|---|---|
| 8.1 从Prompt到Context | 为什么Prompt Engineering不够用了 | 1小时 |
| 8.2 上下文工程核心原理 | 上下文窗口管理、信息密度优化 | 2小时 |
| 8.3 上下文组装策略 | System Prompt / 知识检索 / 工具结果 / 对话历史编排 | 2小时 |
| 8.4 长程Agent的上下文管理 | 滑动窗口 / 摘要压缩 / 分层记忆 | 2小时 |
| 8.5 实战:代码库维护Agent | 构建一个需要长程上下文管理的实际系统 | 3小时 |
8.1 从Prompt Engineering到Context Engineering¶
8.1.1 为什么需要上下文工程?¶
Prompt Engineering聚焦于单次交互——如何写好一条指令让LLM返回期望结果。但当我们构建Agent时,面临的是完全不同的挑战:
Prompt Engineering(单次交互):
用户 → [精心设计的Prompt] → LLM → 输出
Context Engineering(多轮自主Agent):
用户 → Agent → [工具调用1] → 观察 → [思考] → [工具调用2] → 观察 → ...
↑
上下文窗口在不断膨胀!
→ 如何管理这些信息?
→ 哪些该保留?哪些该丢弃?
→ 如何保证关键信息不被"挤出"窗口?
关键区别:
| 维度 | Prompt Engineering | Context Engineering |
|---|---|---|
| 作用范围 | 单次LLM调用 | 整个Agent运行周期 |
| 核心目标 | 让LLM理解意图 | 让Agent在多步骤中保持连贯 |
| 信息来源 | 用户输入 | 用户输入 + 工具结果 + 历史对话 + 检索知识 + 系统指令 |
| 管理复杂度 | 低(手工编写) | 高(动态组装、压缩、淘汰) |
| 失败模式 | 回答不准确 | Agent迷失方向、重复犯错、遗忘关键信息 |
8.1.2 Andrej Karpathy 的定义¶
"I like the term 'context engineering' — the art and science of filling the context window with just the right information for the next step."
— Andrej Karpathy, 2025
Karpathy指出,现代AI工程的核心不再是"如何写Prompt",而是如何在有限的上下文窗口中,精确地组装出Agent下一步决策所需的全部信息。这包括:
- 系统指令(System Prompt)—— Agent的身份、能力、约束
- 工具定义与结果(Tool Definitions & Results)—— 可用工具及其返回值
- 对话历史(Conversation History)—— 用户的需求演变
- 检索到的知识(Retrieved Knowledge)—— 来自RAG/知识库的上下文信息
- Agent的思考过程(Scratchpad/CoT)—— 中间推理步骤
8.1.3 上下文工程的核心挑战¶
# 一个典型的Agent上下文膨胀场景
class ContextExplosionExample:
"""
假设一个代码审查Agent,上下文窗口为128K tokens。
"""
def estimate_context_usage(self):
# System Prompt: 角色定义 + 审查规范
system_prompt = 2000 # tokens
# 工具定义: 10个工具的schema
tool_definitions = 3000 # tokens
# 代码文件: 用户提交的代码
code_files = 15000 # tokens (中型PR)
# 前几轮交互历史
conversation_history = 8000 # tokens
# 之前的工具调用结果(文件读取、lint结果等)
tool_results = 40000 # tokens (4个文件的完整内容 + lint输出)
# Agent的思考过程
reasoning_trace = 12000 # tokens
# RAG检索到的编码规范
retrieved_docs = 10000 # tokens
total = (system_prompt + tool_definitions + code_files +
conversation_history + tool_results + reasoning_trace +
retrieved_docs)
print(f"总计: {total:,} tokens")
print(f"占128K窗口: {total/128000*100:.1f}%")
print(f"剩余给Agent生成回复: {128000-total:,} tokens")
# 总计: 90,000 tokens → 已占70%!
# 而这只是5步交互,如果Agent要运行20步呢?
核心挑战总结:
- 上下文窗口有限:即使200K token窗口,在长程Agent中也会耗尽
- 信息密度不均:工具返回的大量原始数据中,可能只有5%是关键信息
- 注意力稀释:窗口越大,LLM对关键信息的"注意力"越分散(Lost in the Middle 问题)
- 成本随上下文线性增长:每增加1K token的上下文,API成本相应增加
8.2 上下文工程核心原理¶
8.2.1 上下文窗口的四层架构¶
一个设计良好的Agent上下文应该像分层蛋糕一样组织:
┌─────────────────────────────────────────────────────┐
│ Layer 1: 系统层 │
│ ┌─────────────────────────────────────────────────┐ │
│ │ System Prompt + Identity + Constraints │ │
│ │ (固定不变,每次调用都包含) │ │
│ └─────────────────────────────────────────────────┘ │
│ Layer 2: 知识层 │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Retrieved Docs + Domain Knowledge + RAG结果 │ │
│ │ (按需检索,动态更新) │ │
│ └─────────────────────────────────────────────────┘ │
│ Layer 3: 对话层 │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Conversation History + Tool Results │ │
│ │ (随交互增长,需要压缩和淘汰) │ │
│ └─────────────────────────────────────────────────┘ │
│ Layer 4: 当前任务层 │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Current Query + Immediate Context │ │
│ │ (当前这一步的输入) │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
8.2.2 信息密度优化原则¶
核心原则:上下文中的每个token都应该为Agent的下一步决策提供价值。
from dataclasses import dataclass
from typing import Optional
import tiktoken
@dataclass
class ContextItem:
"""上下文中的一个信息单元"""
content: str
source: str # "system" | "tool" | "user" | "rag" | "reasoning"
relevance: float # 0.0 - 1.0, 与当前任务的相关度
freshness: float # 0.0 - 1.0, 信息新鲜度
token_count: int # token数量
@property
def density_score(self) -> float:
"""信息密度得分 = 相关度 × 新鲜度 / token数量"""
return (self.relevance * self.freshness) / max(self.token_count, 1)
class ContextBudgetManager:
"""上下文预算管理器"""
def __init__(self, max_tokens: int = 128000, reserved_for_output: int = 4096):
self.max_tokens = max_tokens
self.reserved = reserved_for_output
self.available = max_tokens - reserved_for_output
self.encoder = tiktoken.get_encoding("o200k_base") # GPT-4o 使用 o200k_base 编码
# 各层的预算分配
self.budget = {
"system": int(self.available * 0.05), # 5% 给系统指令
"tools_schema": int(self.available * 0.05),# 5% 给工具定义
"knowledge": int(self.available * 0.20), # 20% 给检索知识
"history": int(self.available * 0.40), # 40% 给对话历史
"current": int(self.available * 0.30), # 30% 给当前任务
}
def count_tokens(self, text: str) -> int:
return len(self.encoder.encode(text))
def allocate(self, items: list[ContextItem]) -> list[ContextItem]:
"""
根据信息密度得分,在预算内选择最有价值的上下文项。
"""
# 按层分组
grouped = {}
for item in items:
layer = self._get_layer(item.source)
grouped.setdefault(layer, []).append(item) # setdefault:键不存在则创建空列表,然后追加元素(避免KeyError)
selected = []
for layer, budget in self.budget.items():
layer_items = grouped.get(layer, [])
# 按密度得分降序排列
layer_items.sort(key=lambda x: x.density_score, reverse=True)
used = 0
for item in layer_items:
if used + item.token_count <= budget:
selected.append(item)
used += item.token_count
return selected
def _get_layer(self, source: str) -> str:
mapping = {
"system": "system",
"tool_schema": "tools_schema",
"rag": "knowledge",
"tool_result": "history",
"user": "history",
"assistant": "history",
"reasoning": "current",
}
return mapping.get(source, "current")
# --- 使用示例 ---
# manager = ContextBudgetManager(max_tokens=128000)
# items = [ContextItem(content="...", source="rag", relevance=0.9, freshness=1.0, token_count=500)]
# selected = manager.allocate(items)
8.2.3 Lost in the Middle 问题与缓解¶
2023年的经典论文《Lost in the Middle》揭示了一个关键问题:LLM对上下文中间位置的信息注意力最弱。
注意力分布(简化示意):
高 ▓▓▓▓▓ ▓▓▓▓▓▓
↑ ▓▓▓▓▓ ▓▓▓▓▓▓
│ ▓▓▓▓▓▓ ▓▓▓▓▓▓▓
│ ▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓
│ ▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓
│ ▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓
│ ▓▓▓▓▓▓▓▓▓▓▓▓▓ ▓▓▓▓▓▓▓▓▓▓▓▓▓
低 ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
|←—— 开头 |←———— 中间 ————→| 结尾 ——→|
(高注意力) (低注意力区) (高注意力)
缓解策略:
class PositionAwareContextBuilder:
"""考虑位置效应的上下文构建器"""
def build_context(self, items: list[ContextItem]) -> list[ContextItem]:
"""
策略:将最重要的信息放在开头和结尾,
次要信息放在中间。
"""
sorted_items = sorted(items, key=lambda x: x.relevance, reverse=True)
n = len(sorted_items)
result = [None] * n
# 最重要的放开头
head_idx = 0
# 次重要的放结尾
tail_idx = n - 1
for i, item in enumerate(sorted_items):
if i % 2 == 0: # 偶数索引(第1、3、5…重要的项) → 放开头
result[head_idx] = item
head_idx += 1
else: # 奇数索引(第2、4、6…重要的项) → 放结尾
result[tail_idx] = item
tail_idx -= 1
return [r for r in result if r is not None] # 过滤None,返回重排后的列表
def add_anchor_points(self, context: str, key_info: list[str]) -> str:
"""
在上下文中间位置添加"锚点"标记,
提醒LLM注意关键信息。
"""
anchored = context
for info in key_info:
# 用XML标签包裹关键信息
anchored = anchored.replace(
info,
f"<critical_info>\n{info}\n</critical_info>"
)
return anchored
8.3 上下文组装策略¶
8.3.1 System Prompt工程¶
System Prompt是Agent的"操作系统"——它定义了Agent的身份、能力边界和行为规范。
# 好的System Prompt结构
AGENT_SYSTEM_PROMPT = """
# Role
You are a senior code reviewer specializing in Python and TypeScript.
# Capabilities
You have access to the following tools:
- read_file: Read source code files
- search_codebase: Search for patterns across the project
- run_tests: Execute the test suite
- suggest_fix: Propose code changes
# Constraints
- Never modify files directly; only suggest changes
- Always run tests before approving a PR
- Flag security issues with [SECURITY] prefix
- If unsure about a pattern, search the codebase for existing conventions
# Output Format
Structure your review as:
1. Summary (1-2 sentences)
2. Critical Issues (must fix)
3. Suggestions (nice to have)
4. Approval Status: APPROVE / REQUEST_CHANGES / COMMENT
# Current Context
Repository: {repo_name}
PR Title: {pr_title}
Files Changed: {files_changed}
"""
System Prompt设计原则:
| 原则 | 说明 | 示例 |
|---|---|---|
| 角色明确 | 清晰定义Agent的专业能力 | "You are a senior code reviewer" |
| 能力边界 | 告诉Agent能做什么、不能做什么 | "Never modify files directly" |
| 输出格式 | 规定Agent的回复结构 | "Structure as: Summary / Issues / Status" |
| 动态注入 | 将运行时信息插入模板 | {repo_name}, {files_changed} |
| 简洁高效 | 每个token都应有价值 | 避免冗长的背景描述 |
8.3.2 工具结果压缩¶
工具返回的原始数据通常很冗长——一个文件读取可能返回500行代码,但Agent只需要其中的3个函数。
class ToolResultCompressor:
"""工具结果压缩器"""
def compress_file_content(self, content: str, query: str,
max_tokens: int = 2000) -> str:
"""
智能压缩文件内容,只保留与查询相关的部分。
"""
lines = content.split('\n')
# 1. 识别相关行(简单的关键词匹配,实际可用语义搜索)
relevant_ranges = []
query_terms = query.lower().split()
for i, line in enumerate(lines):
if any(term in line.lower() for term in query_terms): # 逐行检查是否包含任一查询词
# 保留上下文(前后5行)
start = max(0, i - 5)
end = min(len(lines), i + 6)
relevant_ranges.append((start, end))
# 2. 合并重叠范围
merged = self._merge_ranges(relevant_ranges)
# 3. 构建压缩结果
compressed_parts = []
last_end = 0
for start, end in merged:
if start > last_end:
compressed_parts.append(f"... (lines {last_end+1}-{start} omitted) ...")
compressed_parts.append(
'\n'.join(f"{i+1:4d} | {lines[i]}" for i in range(start, end))
)
last_end = end
if last_end < len(lines):
compressed_parts.append(f"... (lines {last_end+1}-{len(lines)} omitted) ...")
return '\n'.join(compressed_parts)
def compress_search_results(self, results: list[dict],
top_k: int = 5) -> str:
"""压缩搜索结果,只保留最相关的top_k条"""
# 按相关度排序
sorted_results = sorted(results, key=lambda x: x.get('score', 0), reverse=True)
top_results = sorted_results[:top_k]
compressed = []
for i, r in enumerate(top_results, 1):
compressed.append(
f"[{i}] {r['file']}:{r['line']} (score: {r['score']:.2f})\n"
f" {r['snippet'][:200]}"
)
return '\n'.join(compressed)
def _merge_ranges(self, ranges: list[tuple]) -> list[tuple]:
if not ranges:
return []
sorted_ranges = sorted(ranges)
merged = [sorted_ranges[0]]
for start, end in sorted_ranges[1:]:
if start <= merged[-1][1]:
merged[-1] = (merged[-1][0], max(merged[-1][1], end))
else:
merged.append((start, end))
return merged
8.3.3 对话历史管理¶
对话历史是上下文中增长最快的部分。需要平衡"记住过去的信息"和"不浪费空间"。
from abc import ABC, abstractmethod
from datetime import datetime
class HistoryManager(ABC): # ABC抽象基类:定义接口规范,子类必须实现抽象方法
"""对话历史管理器基类"""
@abstractmethod
def add_message(self, role: str, content: str, metadata: dict = None):
pass
@abstractmethod
def get_context(self, max_tokens: int) -> list[dict]:
pass
class SlidingWindowHistory(HistoryManager):
"""
策略1: 滑动窗口 —— 只保留最近N轮对话
优点: 简单高效
缺点: 可能丢失早期关键信息
"""
def __init__(self, max_turns: int = 20):
self.messages = []
self.max_turns = max_turns
def add_message(self, role: str, content: str, metadata: dict = None):
self.messages.append({
"role": role,
"content": content,
"timestamp": datetime.now().isoformat()
})
def get_context(self, max_tokens: int = None) -> list[dict]:
# 保留最近max_turns轮
recent = self.messages[-self.max_turns * 2:] # ×2因为包含user+assistant
return [{"role": m["role"], "content": m["content"]} for m in recent]
class SummaryCompressHistory(HistoryManager):
"""
策略2: 摘要压缩 —— 将旧对话压缩为摘要
优点: 保留关键信息的同时减少token
缺点: 摘要可能丢失细节
"""
def __init__(self, summarizer, recent_turns: int = 5):
self.messages = []
self.summary = ""
self.recent_turns = recent_turns
self.summarizer = summarizer # LLM摘要函数
def add_message(self, role: str, content: str, metadata: dict = None):
self.messages.append({"role": role, "content": content})
# 当历史超过阈值时,压缩旧部分
if len(self.messages) > self.recent_turns * 2 + 10:
self._compress()
def _compress(self):
"""将旧对话压缩为摘要"""
old_messages = self.messages[:-self.recent_turns * 2]
old_text = "\n".join(
f"{m['role']}: {m['content']}" for m in old_messages
)
self.summary = self.summarizer(
f"Summarize this conversation history concisely, preserving key "
f"decisions, findings, and action items:\n\n{old_text}"
)
# 只保留最近的消息
self.messages = self.messages[-self.recent_turns * 2:]
def get_context(self, max_tokens: int = None) -> list[dict]:
context = []
if self.summary:
context.append({
"role": "system",
"content": f"[Previous conversation summary]: {self.summary}"
})
context.extend(
{"role": m["role"], "content": m["content"]} for m in self.messages
)
return context
class HierarchicalMemory(HistoryManager):
"""
策略3: 分层记忆 —— 工作记忆 + 短期记忆 + 长期记忆
这是最复杂但最强大的方案。
"""
def __init__(self, summarizer, embedding_fn):
# 工作记忆: 当前任务的即时信息
self.working_memory = []
# 短期记忆: 最近的交互历史
self.short_term = []
# 长期记忆: 持久化的关键信息
self.long_term = [] # (embedding, text, metadata)
self.summarizer = summarizer
self.embedding_fn = embedding_fn
def add_message(self, role: str, content: str, metadata: dict = None):
msg = {"role": role, "content": content, "metadata": metadata or {}}
self.working_memory.append(msg)
# 工作记忆溢出时,转移到短期/长期
if len(self.working_memory) > 10:
self._consolidate()
def _consolidate(self):
"""记忆整合: 工作记忆 → 短期记忆 → 长期记忆"""
# 将最旧的工作记忆条目移到短期记忆
old = self.working_memory[:5]
self.working_memory = self.working_memory[5:]
self.short_term.extend(old)
# 短期记忆过多时,提取关键信息到长期记忆
if len(self.short_term) > 20:
to_archive = self.short_term[:10]
self.short_term = self.short_term[10:]
# 提取关键信息并存储
key_text = "\n".join(m["content"] for m in to_archive)
summary = self.summarizer(
f"Extract key decisions, findings, and important facts:\n{key_text}"
)
embedding = self.embedding_fn(summary)
self.long_term.append({
"embedding": embedding,
"text": summary,
"timestamp": datetime.now().isoformat()
})
def get_context(self, max_tokens: int = None) -> list[dict]:
context = []
# 1. 从长期记忆中检索相关信息
if self.long_term and self.working_memory:
current_query = self.working_memory[-1]["content"]
query_embedding = self.embedding_fn(current_query)
relevant = self._retrieve_from_long_term(query_embedding, top_k=3)
if relevant:
context.append({
"role": "system",
"content": f"[Relevant past context]: {' | '.join(relevant)}"
})
# 2. 加入短期记忆摘要
if self.short_term:
short_text = " ".join(m["content"][:100] for m in self.short_term[-5:])
context.append({
"role": "system",
"content": f"[Recent context]: {short_text}"
})
# 3. 加入全部工作记忆
for m in self.working_memory:
context.append({"role": m["role"], "content": m["content"]})
return context
def _retrieve_from_long_term(self, query_embedding, top_k: int = 3) -> list[str]:
"""从长期记忆中语义检索"""
scored = []
for item in self.long_term:
score = self._cosine_similarity(query_embedding, item["embedding"])
scored.append((score, item["text"]))
scored.sort(reverse=True)
return [text for _, text in scored[:top_k]]
@staticmethod
def _cosine_similarity(a, b):
import numpy as np
return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))
8.4 长程Agent的上下文管理实战¶
8.4.1 TODO驱动的研究范式¶
对于需要长时间运行的Agent(如Deep Research Agent),一种有效的模式是TODO驱动:Agent维护一个结构化的任务列表,每完成一步就更新状态,避免在长程运行中"迷失方向"。
import json
from enum import Enum
# Enum枚举类:将固定选项定义为命名常量,比字符串更安全(可防拼写错误、支持IDE补全)
class TaskStatus(Enum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
BLOCKED = "blocked"
class TodoDrivenAgent:
"""
TODO驱动的长程Agent
核心思想: 通过结构化的任务列表管理Agent的执行流程,
避免在长程运行中丢失目标。
"""
def __init__(self, llm_client, tools: dict):
self.llm = llm_client
self.tools = tools
self.todo_list = []
self.completed_results = {}
self.execution_log = []
def plan(self, user_goal: str) -> list[dict]:
"""
阶段1: 规划 —— 将用户目标分解为结构化的TODO列表
"""
planning_prompt = f"""
Given the following goal, create a structured research plan as a JSON array.
Each item should have: title, intent, query, dependencies (list of prerequisite task indices).
Goal: {user_goal}
Requirements:
- Break down into 5-10 actionable research tasks
- Each task should be specific and independently verifiable
- Include dependencies where tasks must be done in order
- Queries should be precise enough for web search
Output format: JSON array of objects with fields:
title, intent, query, dependencies
"""
response = self.llm.chat(planning_prompt)
self.todo_list = json.loads(response)
for i, task in enumerate(self.todo_list):
task["id"] = i
task["status"] = TaskStatus.PENDING.value
task["result"] = None
return self.todo_list
def execute(self) -> dict:
"""
阶段2: 执行 —— 按依赖顺序执行TODO列表
"""
while self._has_pending_tasks():
# 找到下一个可执行的任务
task = self._get_next_executable_task()
if not task:
break
task["status"] = TaskStatus.IN_PROGRESS.value
# 构建执行上下文(关键:只包含相关信息)
context = self._build_task_context(task)
# 执行任务
result = self._execute_task(task, context)
# 更新状态
task["status"] = TaskStatus.COMPLETED.value
task["result"] = result
self.completed_results[task["id"]] = result
# 记录执行日志
self.execution_log.append({
"task_id": task["id"],
"title": task["title"],
"result_summary": result[:200] if result else ""
})
return self._synthesize_results()
def _build_task_context(self, task: dict) -> str:
"""
关键方法:为当前任务构建精准的上下文。
只包含:
1. 任务自身的信息
2. 已完成的依赖任务的结果
3. 整体计划的摘要(不是全部细节)
"""
context_parts = []
# 1. 整体计划概览(压缩版)
plan_summary = "\n".join(
f" [{t['status']}] {t['title']}" for t in self.todo_list
)
context_parts.append(f"## Overall Plan\n{plan_summary}")
# 2. 依赖任务的结果(关键信息)
for dep_id in task.get("dependencies", []):
dep_result = self.completed_results.get(dep_id)
if dep_result:
dep_task = self.todo_list[dep_id]
context_parts.append(
f"## Dependency Result: {dep_task['title']}\n"
f"{dep_result[:500]}" # 限制长度
)
# 3. 当前任务详情
context_parts.append(
f"## Current Task\n"
f"Title: {task['title']}\n"
f"Intent: {task['intent']}\n"
f"Query: {task['query']}"
)
return "\n\n".join(context_parts)
def _get_next_executable_task(self) -> dict | None:
"""找到依赖已满足的下一个待执行任务"""
for task in self.todo_list:
if task["status"] != TaskStatus.PENDING.value:
continue
deps = task.get("dependencies", [])
if all(
self.todo_list[d]["status"] == TaskStatus.COMPLETED.value
for d in deps
):
return task
return None
def _has_pending_tasks(self) -> bool:
return any(
t["status"] in (TaskStatus.PENDING.value, TaskStatus.IN_PROGRESS.value)
for t in self.todo_list
)
def _execute_task(self, task: dict, context: str) -> str:
"""执行单个任务"""
prompt = f"""
{context}
Execute the current task and provide a comprehensive result.
Use available tools if needed.
"""
return self.llm.chat(prompt)
def _synthesize_results(self) -> dict:
"""阶段3: 综合 —— 将所有结果整合为最终报告"""
all_results = "\n\n".join(
f"### {t['title']}\n{t.get('result', 'N/A')[:300]}"
for t in self.todo_list
)
synthesis_prompt = f"""
Based on the following research results, synthesize a comprehensive final report:
{all_results}
Structure:
1. Executive Summary
2. Key Findings
3. Detailed Analysis
4. Recommendations
"""
final_report = self.llm.chat(synthesis_prompt)
return {
"report": final_report,
"tasks": self.todo_list,
"execution_log": self.execution_log
}
8.4.2 Scratchpad模式¶
另一种管理长程上下文的方式是给Agent一个"草稿纸"(Scratchpad)——Agent可以在每一步中写下关键发现,而不需要在上下文中保留所有历史细节。
class ScratchpadAgent:
"""
Scratchpad模式的Agent
Agent在每一步中将关键信息写入scratchpad,
下一步只需要读取scratchpad而非全部历史。
"""
def __init__(self, llm_client):
self.llm = llm_client
self.scratchpad = [] # 结构化的笔记
def step(self, observation: str) -> str:
"""执行一步推理"""
# 构建上下文:只包含scratchpad + 当前观察
context = self._format_scratchpad()
prompt = f"""
## Your Notes (from previous steps)
{context}
## Current Observation
{observation}
## Instructions
1. Analyze the current observation
2. Update your notes with any new key findings (use <note> tags)
3. Decide on the next action
Format:
<note>Key finding or decision to remember</note>
<action>Next action to take</action>
"""
response = self.llm.chat(prompt)
# 提取并保存笔记
notes = self._extract_notes(response)
self.scratchpad.extend(notes)
# 定期压缩scratchpad
if len(self.scratchpad) > 20:
self._compress_scratchpad()
return self._extract_action(response)
def _format_scratchpad(self) -> str:
if not self.scratchpad:
return "(No notes yet)"
return "\n".join(f"- {note}" for note in self.scratchpad)
def _extract_notes(self, response: str) -> list[str]:
import re
return re.findall(r'<note>(.*?)</note>', response, re.DOTALL)
def _extract_action(self, response: str) -> str:
import re
match = re.search(r'<action>(.*?)</action>', response, re.DOTALL)
return match.group(1).strip() if match else response
def _compress_scratchpad(self):
"""压缩scratchpad中的旧笔记"""
old_notes = self.scratchpad[:15]
recent_notes = self.scratchpad[15:]
summary = self.llm.chat(
f"Compress these notes into 3-5 key points:\n" +
"\n".join(f"- {n}" for n in old_notes)
)
self.scratchpad = [f"[Summary of earlier notes]: {summary}"] + recent_notes
8.5 实战:代码库维护Agent¶
让我们构建一个完整的代码库维护Agent,它需要分析整个项目的代码结构、找到潜在问题、提出改进建议。这是一个典型的长程Agent场景,需要精心的上下文管理。
"""
代码库维护Agent - 完整实现
演示上下文工程在实际Agent中的应用
"""
import os
import json
from pathlib import Path
from dataclasses import dataclass, field
from typing import Optional
@dataclass
class FileInfo:
"""文件信息"""
path: str
language: str
lines: int
functions: list[str] = field(default_factory=list)
classes: list[str] = field(default_factory=list)
imports: list[str] = field(default_factory=list)
todos: list[str] = field(default_factory=list)
class CodebaseMaintenanceAgent:
"""
代码库维护Agent
展示上下文工程的最佳实践
"""
SYSTEM_PROMPT = """You are a senior software engineer conducting a codebase
maintenance review. You analyze code structure, identify issues, and suggest
improvements.
Available tools:
- list_files(directory): List files in a directory
- read_file(path): Read a file's content
- search_code(pattern): Search for code patterns
- count_lines(path): Count lines of code
Guidelines:
- Focus on architectural and structural issues, not style nitpicks
- Prioritize findings by impact: Critical > High > Medium > Low
- Always provide actionable recommendations
- Use your scratchpad to track findings across files
"""
def __init__(self, project_root: str, llm_client):
self.project_root = Path(project_root)
self.llm = llm_client
# 上下文管理组件
self.budget_manager = ContextBudgetManager(max_tokens=128000)
self.history_manager = SummaryCompressHistory(
summarizer=lambda text: llm_client.chat(
f"Summarize concisely: {text}"
)
)
self.scratchpad = []
# 项目知识
self.project_map = {} # 项目结构缓存
self.analyzed_files = set()
def run(self, query: str) -> str:
"""
处理一个维护查询
"""
# Step 1: 确保有项目地图
if not self.project_map:
self._build_project_map()
# Step 2: 构建上下文
context = self._build_context(query)
# Step 3: 多轮推理
max_steps = 10
for step in range(max_steps):
response = self.llm.chat(context)
# 检查是否有工具调用
tool_call = self._parse_tool_call(response)
if tool_call:
result = self._execute_tool(tool_call)
# 压缩工具结果后加入上下文
compressed = ToolResultCompressor().compress_file_content(
result, query
)
context += f"\n\nTool Result ({tool_call['name']}):\n{compressed}"
# 更新scratchpad
note = self._extract_finding(response)
if note:
self.scratchpad.append(note)
else:
# Agent给出了最终回答
return response
return "Analysis complete. " + self._format_scratchpad()
def _build_project_map(self):
"""
构建项目结构地图(轻量级,不读取文件内容)
这个地图会被缓存,不占用上下文预算
"""
for root, dirs, files in os.walk(self.project_root):
# 跳过常见的非代码目录
# dirs[:]就地修改列表:通知os.walk跳过指定目录。若直接dirs=[...]只会创建新变量不影响遍历
dirs[:] = [d for d in dirs
if d not in {'node_modules', '.git', '__pycache__',
'venv', '.venv', 'dist', 'build'}]
for f in files:
if f.endswith(('.py', '.js', '.ts', '.go', '.rs', '.java')):
full_path = Path(root) / f
rel_path = full_path.relative_to(self.project_root)
# with语句确保文件句柄在计数完成后正确关闭
with open(full_path, encoding='utf-8', errors='ignore') as fh:
line_count = sum(1 for _ in fh) # 生成器逐行计数,比readlines()更省内存
self.project_map[str(rel_path)] = FileInfo(
path=str(rel_path),
language=full_path.suffix,
lines=line_count
)
def _build_context(self, query: str) -> str:
"""
精心构建上下文(上下文工程的核心)
"""
parts = []
# Layer 1: System Prompt(固定,约2000 tokens)
parts.append(self.SYSTEM_PROMPT)
# Layer 2: 项目概览(压缩版,约500 tokens)
project_summary = self._get_project_summary()
parts.append(f"\n## Project Overview\n{project_summary}")
# Layer 3: Scratchpad(之前的发现,约1000 tokens)
if self.scratchpad:
notes = "\n".join(f"- {n}" for n in self.scratchpad[-10:])
parts.append(f"\n## Your Previous Findings\n{notes}")
# Layer 4: 历史摘要(约500 tokens)
history = self.history_manager.get_context()
if history:
parts.append(f"\n## Conversation Context")
for h in history:
parts.append(f"{h['role']}: {h['content'][:200]}")
# Layer 5: 当前查询
parts.append(f"\n## Current Query\n{query}")
return "\n".join(parts)
def _get_project_summary(self) -> str:
"""项目结构的压缩摘要"""
lang_stats = {}
total_files = 0
total_lines = 0
for f_info in self.project_map.values():
lang = f_info.language
lang_stats.setdefault(lang, {"files": 0, "lines": 0}) # setdefault:语言首次出现时初始化统计字典
lang_stats[lang]["files"] += 1
lang_stats[lang]["lines"] += f_info.lines
total_files += 1
total_lines += f_info.lines
summary = f"Total: {total_files} files, {total_lines:,} lines\n"
for lang, stats in sorted(lang_stats.items()):
summary += f" {lang}: {stats['files']} files, {stats['lines']:,} lines\n"
# 只列出顶层目录结构
top_dirs = set()
for path in self.project_map:
parts = Path(path).parts
if len(parts) > 1:
top_dirs.add(parts[0])
summary += f"Top-level directories: {', '.join(sorted(top_dirs))}"
return summary
def _parse_tool_call(self, response: str) -> Optional[dict]:
"""解析Agent响应中的工具调用"""
import re
match = re.search(r'<tool>(.*?)</tool>', response, re.DOTALL)
if match:
try:
return json.loads(match.group(1))
except json.JSONDecodeError:
return None
return None
def _execute_tool(self, tool_call: dict) -> str:
"""执行工具调用"""
name = tool_call.get("name")
args = tool_call.get("args", {})
if name == "read_file":
path = self.project_root / args["path"]
return path.read_text(errors='ignore')
elif name == "list_files":
directory = self.project_root / args.get("directory", "")
return "\n".join(
str(p.relative_to(self.project_root))
for p in directory.iterdir() if p.is_file()
)
elif name == "search_code":
return self._search_code(args["pattern"])
else:
return f"Unknown tool: {name}"
def _search_code(self, pattern: str) -> str:
import re
results = []
for rel_path in self.project_map:
full_path = self.project_root / rel_path
try:
content = full_path.read_text(errors='ignore')
for i, line in enumerate(content.split('\n'), 1):
if re.search(pattern, line, re.IGNORECASE):
results.append(f"{rel_path}:{i}: {line.strip()}")
except Exception:
pass
return "\n".join(results[:50]) # 限制结果数
def _extract_finding(self, response: str) -> Optional[str]:
import re
match = re.search(r'<finding>(.*?)</finding>', response, re.DOTALL)
return match.group(1).strip() if match else None
def _format_scratchpad(self) -> str:
return "\n".join(f"- {n}" for n in self.scratchpad)
8.6 最佳实践总结¶
上下文工程清单¶
| # | 实践 | 说明 |
|---|---|---|
| 1 | 预算先行 | 在设计Agent前就规划上下文预算分配 |
| 2 | 分层组织 | System/Knowledge/History/Current 四层架构 |
| 3 | 压缩工具结果 | 永远不要把原始工具输出全部塞入上下文 |
| 4 | 位置感知 | 关键信息放开头或结尾,避免Lost in the Middle |
| 5 | TODO驱动 | 长程Agent用结构化任务列表防止迷失 |
| 6 | Scratchpad | 让Agent用笔记代替完整历史 |
| 7 | 分层记忆 | 工作记忆→短期→长期的三级记忆架构 |
| 8 | 动态检索 | 按需从知识库检索,而非预加载全部 |
| 9 | 成本监控 | 监控每步的token消耗,设置上限 |
| 10 | 锚点标记 | 用XML标签包裹关键信息帮助LLM聚焦 |
与hello-agents的对比优势¶
本章内容相比datawhalechina/hello-agents的第九章"上下文工程",增加了以下独特价值:
- ContextBudgetManager完整实现:可直接应用的预算管理系统
- 三种HistoryManager策略对比:滑动窗口/摘要压缩/分层记忆的完整代码
- ToolResultCompressor:实用的工具结果压缩方案(hello-agents未涉及)
- Lost in the Middle缓解方案:PositionAwareContextBuilder的实用代码
- TODO驱动的Deep Research范式:比hello-agents的介绍更完整的可运行代码
- 代码库维护Agent完整案例:将所有上下文工程技术整合在一个Agent中
📝 练习¶
练习1:实现Token预算管理器(基础)¶
实现一个TokenBudgetManager,支持: - 设置总预算和各层预算 - 添加上下文项时自动检查预算 - 当超预算时,自动淘汰最低优先级的项
练习2:实现Scratchpad Agent(中级)¶
基于ScratchpadAgent模板,实现一个能够: - 浏览网页收集信息 - 在scratchpad中记录关键发现 - 当scratchpad过长时自动压缩 - 最终综合scratchpad生成报告
练习3:设计上下文压缩策略(高级)¶
设计一个通用的上下文压缩框架,支持: - 多种压缩策略(摘要、截断、抽取、量化) - 按信息类型选择压缩策略 - A/B测试不同压缩策略对Agent性能的影响
📚 参考资料¶
- Andrej Karpathy: "Context Engineering" (2025) — 上下文工程概念的提出
- Liu et al.: "Lost in the Middle: How Language Models Use Long Contexts" (2023)
- Park et al.: "Generative Agents: Interactive Simulacra of Human Behavior" (2023)
- OpenAI: "Memory & Context Management in Agents" (2025)
- LangChain Docs: "Conversation Memory" — 对话记忆实现参考
- Anthropic: "Building Effective Agents" (2025) — Claude的Agent上下文最佳实践
📝 本章小结¶
本章系统学习了上下文工程(Context Engineering)的核心知识:
- ✅ 理解了从Prompt Engineering到Context Engineering的演进
- ✅ 掌握了上下文窗口四层架构和信息密度优化原则
- ✅ 学会了工具结果压缩与Lost in the Middle缓解策略
- ✅ 掌握了三种对话历史管理策略(滑动窗口/摘要压缩/分层记忆)
- ✅ 学会了TODO驱动和Scratchpad两种长程Agent上下文管理模式
- ✅ 完成了代码库维护Agent实战案例
✅ 学习检查清单¶
- 能解释Prompt Engineering和Context Engineering的核心区别
- 能实现上下文预算管理器(ContextBudgetManager)
- 能实现工具结果压缩器(ToolResultCompressor)
- 理解Lost in the Middle问题及缓解策略
- 能实现三种对话历史管理策略
- 能设计TODO驱动的长程Agent
- 能使用Scratchpad模式管理Agent上下文
- 能将上下文工程技术整合到实际Agent项目中
🔗 下一步¶
下一章我们将学习Agent强化学习,探索如何通过奖励信号优化Agent的决策能力。
继续学习: 09-Agent强化学习
祝你学习愉快! 🎉
最后更新日期:2026-02-12 适用版本:AI Agent开发实战教程 v2026