跳转至

第八章 Context Engineering(上下文工程)

⚠️ 时效性说明:本章涉及前沿模型/价格/榜单等信息,可能随版本快速变化;请以论文原文、官方发布页和 API 文档为准。

📌 定位说明:本章是2025年Agent领域最热话题之一。Context Engineering(上下文工程)由Andrej Karpathy、Shopify CEO Tobi Lütke等推动,被认为是"Prompt Engineering的进化版"——当Agent需要长时间自主运行、系统性完成复杂任务时,如何构建、管理和优化上下文窗口中的信息,成为了Agent能力的核心瓶颈。

📖 本章概览

主题 内容 预计学时
8.1 从Prompt到Context 为什么Prompt Engineering不够用了 1小时
8.2 上下文工程核心原理 上下文窗口管理、信息密度优化 2小时
8.3 上下文组装策略 System Prompt / 知识检索 / 工具结果 / 对话历史编排 2小时
8.4 长程Agent的上下文管理 滑动窗口 / 摘要压缩 / 分层记忆 2小时
8.5 实战:代码库维护Agent 构建一个需要长程上下文管理的实际系统 3小时

8.1 从Prompt Engineering到Context Engineering

8.1.1 为什么需要上下文工程?

Prompt Engineering聚焦于单次交互——如何写好一条指令让LLM返回期望结果。但当我们构建Agent时,面临的是完全不同的挑战:

Text Only
Prompt Engineering(单次交互):
  用户 → [精心设计的Prompt] → LLM → 输出

Context Engineering(多轮自主Agent):
  用户 → Agent → [工具调用1] → 观察 → [思考] → [工具调用2] → 观察 → ...
                                        上下文窗口在不断膨胀!
                                        → 如何管理这些信息?
                                        → 哪些该保留?哪些该丢弃?
                                        → 如何保证关键信息不被"挤出"窗口?

关键区别

维度 Prompt Engineering Context Engineering
作用范围 单次LLM调用 整个Agent运行周期
核心目标 让LLM理解意图 让Agent在多步骤中保持连贯
信息来源 用户输入 用户输入 + 工具结果 + 历史对话 + 检索知识 + 系统指令
管理复杂度 低(手工编写) 高(动态组装、压缩、淘汰)
失败模式 回答不准确 Agent迷失方向、重复犯错、遗忘关键信息

8.1.2 Andrej Karpathy 的定义

"I like the term 'context engineering' — the art and science of filling the context window with just the right information for the next step."

— Andrej Karpathy, 2025

Karpathy指出,现代AI工程的核心不再是"如何写Prompt",而是如何在有限的上下文窗口中,精确地组装出Agent下一步决策所需的全部信息。这包括:

  1. 系统指令(System Prompt)—— Agent的身份、能力、约束
  2. 工具定义与结果(Tool Definitions & Results)—— 可用工具及其返回值
  3. 对话历史(Conversation History)—— 用户的需求演变
  4. 检索到的知识(Retrieved Knowledge)—— 来自RAG/知识库的上下文信息
  5. Agent的思考过程(Scratchpad/CoT)—— 中间推理步骤

8.1.3 上下文工程的核心挑战

Python
# 一个典型的Agent上下文膨胀场景
class ContextExplosionExample:
    """
    假设一个代码审查Agent,上下文窗口为128K tokens。
    """

    def estimate_context_usage(self):
        # System Prompt: 角色定义 + 审查规范
        system_prompt = 2000  # tokens

        # 工具定义: 10个工具的schema
        tool_definitions = 3000  # tokens

        # 代码文件: 用户提交的代码
        code_files = 15000  # tokens (中型PR)

        # 前几轮交互历史
        conversation_history = 8000  # tokens

        # 之前的工具调用结果(文件读取、lint结果等)
        tool_results = 40000  # tokens (4个文件的完整内容 + lint输出)

        # Agent的思考过程
        reasoning_trace = 12000  # tokens

        # RAG检索到的编码规范
        retrieved_docs = 10000  # tokens

        total = (system_prompt + tool_definitions + code_files +
                 conversation_history + tool_results + reasoning_trace +
                 retrieved_docs)

        print(f"总计: {total:,} tokens")
        print(f"占128K窗口: {total/128000*100:.1f}%")
        print(f"剩余给Agent生成回复: {128000-total:,} tokens")
        # 总计: 90,000 tokens → 已占70%!
        # 而这只是5步交互,如果Agent要运行20步呢?

核心挑战总结

  1. 上下文窗口有限:即使200K token窗口,在长程Agent中也会耗尽
  2. 信息密度不均:工具返回的大量原始数据中,可能只有5%是关键信息
  3. 注意力稀释:窗口越大,LLM对关键信息的"注意力"越分散(Lost in the Middle 问题)
  4. 成本随上下文线性增长:每增加1K token的上下文,API成本相应增加

8.2 上下文工程核心原理

8.2.1 上下文窗口的四层架构

一个设计良好的Agent上下文应该像分层蛋糕一样组织:

Text Only
┌─────────────────────────────────────────────────────┐
│                  Layer 1: 系统层                      │
│  ┌─────────────────────────────────────────────────┐ │
│  │  System Prompt + Identity + Constraints         │ │
│  │  (固定不变,每次调用都包含)                         │ │
│  └─────────────────────────────────────────────────┘ │
│                  Layer 2: 知识层                      │
│  ┌─────────────────────────────────────────────────┐ │
│  │  Retrieved Docs + Domain Knowledge + RAG结果     │ │
│  │  (按需检索,动态更新)                              │ │
│  └─────────────────────────────────────────────────┘ │
│                  Layer 3: 对话层                      │
│  ┌─────────────────────────────────────────────────┐ │
│  │  Conversation History + Tool Results             │ │
│  │  (随交互增长,需要压缩和淘汰)                      │ │
│  └─────────────────────────────────────────────────┘ │
│                  Layer 4: 当前任务层                   │
│  ┌─────────────────────────────────────────────────┐ │
│  │  Current Query + Immediate Context               │ │
│  │  (当前这一步的输入)                                │ │
│  └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

8.2.2 信息密度优化原则

核心原则:上下文中的每个token都应该为Agent的下一步决策提供价值。

Python
from dataclasses import dataclass
from typing import Optional
import tiktoken

@dataclass
class ContextItem:
    """上下文中的一个信息单元"""
    content: str
    source: str          # "system" | "tool" | "user" | "rag" | "reasoning"
    relevance: float     # 0.0 - 1.0, 与当前任务的相关度
    freshness: float     # 0.0 - 1.0, 信息新鲜度
    token_count: int     # token数量

    @property
    def density_score(self) -> float:
        """信息密度得分 = 相关度 × 新鲜度 / token数量"""
        return (self.relevance * self.freshness) / max(self.token_count, 1)

class ContextBudgetManager:
    """上下文预算管理器"""

    def __init__(self, max_tokens: int = 128000, reserved_for_output: int = 4096):
        self.max_tokens = max_tokens
        self.reserved = reserved_for_output
        self.available = max_tokens - reserved_for_output
        self.encoder = tiktoken.get_encoding("o200k_base")  # GPT-4o 使用 o200k_base 编码

        # 各层的预算分配
        self.budget = {
            "system": int(self.available * 0.05),     # 5% 给系统指令
            "tools_schema": int(self.available * 0.05),# 5% 给工具定义
            "knowledge": int(self.available * 0.20),   # 20% 给检索知识
            "history": int(self.available * 0.40),     # 40% 给对话历史
            "current": int(self.available * 0.30),     # 30% 给当前任务
        }

    def count_tokens(self, text: str) -> int:
        return len(self.encoder.encode(text))

    def allocate(self, items: list[ContextItem]) -> list[ContextItem]:
        """
        根据信息密度得分,在预算内选择最有价值的上下文项。
        """
        # 按层分组
        grouped = {}
        for item in items:
            layer = self._get_layer(item.source)
            grouped.setdefault(layer, []).append(item)  # setdefault:键不存在则创建空列表,然后追加元素(避免KeyError)

        selected = []
        for layer, budget in self.budget.items():
            layer_items = grouped.get(layer, [])
            # 按密度得分降序排列
            layer_items.sort(key=lambda x: x.density_score, reverse=True)

            used = 0
            for item in layer_items:
                if used + item.token_count <= budget:
                    selected.append(item)
                    used += item.token_count

        return selected

    def _get_layer(self, source: str) -> str:
        mapping = {
            "system": "system",
            "tool_schema": "tools_schema",
            "rag": "knowledge",
            "tool_result": "history",
            "user": "history",
            "assistant": "history",
            "reasoning": "current",
        }
        return mapping.get(source, "current")

# --- 使用示例 ---
# manager = ContextBudgetManager(max_tokens=128000)
# items = [ContextItem(content="...", source="rag", relevance=0.9, freshness=1.0, token_count=500)]
# selected = manager.allocate(items)

8.2.3 Lost in the Middle 问题与缓解

2023年的经典论文《Lost in the Middle》揭示了一个关键问题:LLM对上下文中间位置的信息注意力最弱

Text Only
注意力分布(简化示意):

  高  ▓▓▓▓▓                                         ▓▓▓▓▓▓
  ↑   ▓▓▓▓▓                                         ▓▓▓▓▓▓
  │   ▓▓▓▓▓▓                                       ▓▓▓▓▓▓▓
  │   ▓▓▓▓▓▓▓                                     ▓▓▓▓▓▓▓▓
  │   ▓▓▓▓▓▓▓▓                                   ▓▓▓▓▓▓▓▓▓
  │   ▓▓▓▓▓▓▓▓▓▓                               ▓▓▓▓▓▓▓▓▓▓
  │   ▓▓▓▓▓▓▓▓▓▓▓▓▓                         ▓▓▓▓▓▓▓▓▓▓▓▓▓
  低  ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓
      |←—— 开头    |←———— 中间 ————→|         结尾 ——→|
      (高注意力)     (低注意力区)           (高注意力)

缓解策略

Python
class PositionAwareContextBuilder:
    """考虑位置效应的上下文构建器"""

    def build_context(self, items: list[ContextItem]) -> list[ContextItem]:
        """
        策略:将最重要的信息放在开头和结尾,
        次要信息放在中间。
        """
        sorted_items = sorted(items, key=lambda x: x.relevance, reverse=True)

        n = len(sorted_items)
        result = [None] * n

        # 最重要的放开头
        head_idx = 0
        # 次重要的放结尾
        tail_idx = n - 1

        for i, item in enumerate(sorted_items):
            if i % 2 == 0:  # 偶数索引(第1、3、5…重要的项) → 放开头
                result[head_idx] = item
                head_idx += 1
            else:            # 奇数索引(第2、4、6…重要的项) → 放结尾
                result[tail_idx] = item
                tail_idx -= 1

        return [r for r in result if r is not None]  # 过滤None,返回重排后的列表

    def add_anchor_points(self, context: str, key_info: list[str]) -> str:
        """
        在上下文中间位置添加"锚点"标记,
        提醒LLM注意关键信息。
        """
        anchored = context
        for info in key_info:
            # 用XML标签包裹关键信息
            anchored = anchored.replace(
                info,
                f"<critical_info>\n{info}\n</critical_info>"
            )
        return anchored

8.3 上下文组装策略

8.3.1 System Prompt工程

System Prompt是Agent的"操作系统"——它定义了Agent的身份、能力边界和行为规范。

Python
# 好的System Prompt结构
AGENT_SYSTEM_PROMPT = """
# Role
You are a senior code reviewer specializing in Python and TypeScript.

# Capabilities
You have access to the following tools:
- read_file: Read source code files
- search_codebase: Search for patterns across the project
- run_tests: Execute the test suite
- suggest_fix: Propose code changes

# Constraints
- Never modify files directly; only suggest changes
- Always run tests before approving a PR
- Flag security issues with [SECURITY] prefix
- If unsure about a pattern, search the codebase for existing conventions

# Output Format
Structure your review as:
1. Summary (1-2 sentences)
2. Critical Issues (must fix)
3. Suggestions (nice to have)
4. Approval Status: APPROVE / REQUEST_CHANGES / COMMENT

# Current Context
Repository: {repo_name}
PR Title: {pr_title}
Files Changed: {files_changed}
"""

System Prompt设计原则

原则 说明 示例
角色明确 清晰定义Agent的专业能力 "You are a senior code reviewer"
能力边界 告诉Agent能做什么、不能做什么 "Never modify files directly"
输出格式 规定Agent的回复结构 "Structure as: Summary / Issues / Status"
动态注入 将运行时信息插入模板 {repo_name}, {files_changed}
简洁高效 每个token都应有价值 避免冗长的背景描述

8.3.2 工具结果压缩

工具返回的原始数据通常很冗长——一个文件读取可能返回500行代码,但Agent只需要其中的3个函数。

Python
class ToolResultCompressor:
    """工具结果压缩器"""

    def compress_file_content(self, content: str, query: str,
                               max_tokens: int = 2000) -> str:
        """
        智能压缩文件内容,只保留与查询相关的部分。
        """
        lines = content.split('\n')

        # 1. 识别相关行(简单的关键词匹配,实际可用语义搜索)
        relevant_ranges = []
        query_terms = query.lower().split()

        for i, line in enumerate(lines):
            if any(term in line.lower() for term in query_terms):  # 逐行检查是否包含任一查询词
                # 保留上下文(前后5行)
                start = max(0, i - 5)
                end = min(len(lines), i + 6)
                relevant_ranges.append((start, end))

        # 2. 合并重叠范围
        merged = self._merge_ranges(relevant_ranges)

        # 3. 构建压缩结果
        compressed_parts = []
        last_end = 0
        for start, end in merged:
            if start > last_end:
                compressed_parts.append(f"... (lines {last_end+1}-{start} omitted) ...")
            compressed_parts.append(
                '\n'.join(f"{i+1:4d} | {lines[i]}" for i in range(start, end))
            )
            last_end = end

        if last_end < len(lines):
            compressed_parts.append(f"... (lines {last_end+1}-{len(lines)} omitted) ...")

        return '\n'.join(compressed_parts)

    def compress_search_results(self, results: list[dict],
                                 top_k: int = 5) -> str:
        """压缩搜索结果,只保留最相关的top_k条"""
        # 按相关度排序
        sorted_results = sorted(results, key=lambda x: x.get('score', 0), reverse=True)
        top_results = sorted_results[:top_k]

        compressed = []
        for i, r in enumerate(top_results, 1):
            compressed.append(
                f"[{i}] {r['file']}:{r['line']} (score: {r['score']:.2f})\n"
                f"    {r['snippet'][:200]}"
            )

        return '\n'.join(compressed)

    def _merge_ranges(self, ranges: list[tuple]) -> list[tuple]:
        if not ranges:
            return []
        sorted_ranges = sorted(ranges)
        merged = [sorted_ranges[0]]
        for start, end in sorted_ranges[1:]:
            if start <= merged[-1][1]:
                merged[-1] = (merged[-1][0], max(merged[-1][1], end))
            else:
                merged.append((start, end))
        return merged

8.3.3 对话历史管理

对话历史是上下文中增长最快的部分。需要平衡"记住过去的信息"和"不浪费空间"。

Python
from abc import ABC, abstractmethod
from datetime import datetime

class HistoryManager(ABC):  # ABC抽象基类:定义接口规范,子类必须实现抽象方法
    """对话历史管理器基类"""

    @abstractmethod
    def add_message(self, role: str, content: str, metadata: dict = None):
        pass

    @abstractmethod
    def get_context(self, max_tokens: int) -> list[dict]:
        pass

class SlidingWindowHistory(HistoryManager):
    """
    策略1: 滑动窗口 —— 只保留最近N轮对话
    优点: 简单高效
    缺点: 可能丢失早期关键信息
    """

    def __init__(self, max_turns: int = 20):
        self.messages = []
        self.max_turns = max_turns

    def add_message(self, role: str, content: str, metadata: dict = None):
        self.messages.append({
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat()
        })

    def get_context(self, max_tokens: int = None) -> list[dict]:
        # 保留最近max_turns轮
        recent = self.messages[-self.max_turns * 2:]  # ×2因为包含user+assistant
        return [{"role": m["role"], "content": m["content"]} for m in recent]

class SummaryCompressHistory(HistoryManager):
    """
    策略2: 摘要压缩 —— 将旧对话压缩为摘要
    优点: 保留关键信息的同时减少token
    缺点: 摘要可能丢失细节
    """

    def __init__(self, summarizer, recent_turns: int = 5):
        self.messages = []
        self.summary = ""
        self.recent_turns = recent_turns
        self.summarizer = summarizer  # LLM摘要函数

    def add_message(self, role: str, content: str, metadata: dict = None):
        self.messages.append({"role": role, "content": content})

        # 当历史超过阈值时,压缩旧部分
        if len(self.messages) > self.recent_turns * 2 + 10:
            self._compress()

    def _compress(self):
        """将旧对话压缩为摘要"""
        old_messages = self.messages[:-self.recent_turns * 2]

        old_text = "\n".join(
            f"{m['role']}: {m['content']}" for m in old_messages
        )

        self.summary = self.summarizer(
            f"Summarize this conversation history concisely, preserving key "
            f"decisions, findings, and action items:\n\n{old_text}"
        )

        # 只保留最近的消息
        self.messages = self.messages[-self.recent_turns * 2:]

    def get_context(self, max_tokens: int = None) -> list[dict]:
        context = []
        if self.summary:
            context.append({
                "role": "system",
                "content": f"[Previous conversation summary]: {self.summary}"
            })
        context.extend(
            {"role": m["role"], "content": m["content"]} for m in self.messages
        )
        return context

class HierarchicalMemory(HistoryManager):
    """
    策略3: 分层记忆 —— 工作记忆 + 短期记忆 + 长期记忆
    这是最复杂但最强大的方案。
    """

    def __init__(self, summarizer, embedding_fn):
        # 工作记忆: 当前任务的即时信息
        self.working_memory = []
        # 短期记忆: 最近的交互历史
        self.short_term = []
        # 长期记忆: 持久化的关键信息
        self.long_term = []  # (embedding, text, metadata)

        self.summarizer = summarizer
        self.embedding_fn = embedding_fn

    def add_message(self, role: str, content: str, metadata: dict = None):
        msg = {"role": role, "content": content, "metadata": metadata or {}}
        self.working_memory.append(msg)

        # 工作记忆溢出时,转移到短期/长期
        if len(self.working_memory) > 10:
            self._consolidate()

    def _consolidate(self):
        """记忆整合: 工作记忆 → 短期记忆 → 长期记忆"""
        # 将最旧的工作记忆条目移到短期记忆
        old = self.working_memory[:5]
        self.working_memory = self.working_memory[5:]
        self.short_term.extend(old)

        # 短期记忆过多时,提取关键信息到长期记忆
        if len(self.short_term) > 20:
            to_archive = self.short_term[:10]
            self.short_term = self.short_term[10:]

            # 提取关键信息并存储
            key_text = "\n".join(m["content"] for m in to_archive)
            summary = self.summarizer(
                f"Extract key decisions, findings, and important facts:\n{key_text}"
            )
            embedding = self.embedding_fn(summary)
            self.long_term.append({
                "embedding": embedding,
                "text": summary,
                "timestamp": datetime.now().isoformat()
            })

    def get_context(self, max_tokens: int = None) -> list[dict]:
        context = []

        # 1. 从长期记忆中检索相关信息
        if self.long_term and self.working_memory:
            current_query = self.working_memory[-1]["content"]
            query_embedding = self.embedding_fn(current_query)
            relevant = self._retrieve_from_long_term(query_embedding, top_k=3)
            if relevant:
                context.append({
                    "role": "system",
                    "content": f"[Relevant past context]: {' | '.join(relevant)}"
                })

        # 2. 加入短期记忆摘要
        if self.short_term:
            short_text = " ".join(m["content"][:100] for m in self.short_term[-5:])
            context.append({
                "role": "system",
                "content": f"[Recent context]: {short_text}"
            })

        # 3. 加入全部工作记忆
        for m in self.working_memory:
            context.append({"role": m["role"], "content": m["content"]})

        return context

    def _retrieve_from_long_term(self, query_embedding, top_k: int = 3) -> list[str]:
        """从长期记忆中语义检索"""
        scored = []
        for item in self.long_term:
            score = self._cosine_similarity(query_embedding, item["embedding"])
            scored.append((score, item["text"]))
        scored.sort(reverse=True)
        return [text for _, text in scored[:top_k]]

    @staticmethod
    def _cosine_similarity(a, b):
        import numpy as np
        return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

8.4 长程Agent的上下文管理实战

8.4.1 TODO驱动的研究范式

对于需要长时间运行的Agent(如Deep Research Agent),一种有效的模式是TODO驱动:Agent维护一个结构化的任务列表,每完成一步就更新状态,避免在长程运行中"迷失方向"。

Python
import json
from enum import Enum

# Enum枚举类:将固定选项定义为命名常量,比字符串更安全(可防拼写错误、支持IDE补全)
class TaskStatus(Enum):
    PENDING = "pending"
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"
    BLOCKED = "blocked"

class TodoDrivenAgent:
    """
    TODO驱动的长程Agent
    核心思想: 通过结构化的任务列表管理Agent的执行流程,
    避免在长程运行中丢失目标。
    """

    def __init__(self, llm_client, tools: dict):
        self.llm = llm_client
        self.tools = tools
        self.todo_list = []
        self.completed_results = {}
        self.execution_log = []

    def plan(self, user_goal: str) -> list[dict]:
        """
        阶段1: 规划 —— 将用户目标分解为结构化的TODO列表
        """
        planning_prompt = f"""
        Given the following goal, create a structured research plan as a JSON array.
        Each item should have: title, intent, query, dependencies (list of prerequisite task indices).

        Goal: {user_goal}

        Requirements:
        - Break down into 5-10 actionable research tasks
        - Each task should be specific and independently verifiable
        - Include dependencies where tasks must be done in order
        - Queries should be precise enough for web search

        Output format: JSON array of objects with fields:
        title, intent, query, dependencies
        """

        response = self.llm.chat(planning_prompt)
        self.todo_list = json.loads(response)

        for i, task in enumerate(self.todo_list):
            task["id"] = i
            task["status"] = TaskStatus.PENDING.value
            task["result"] = None

        return self.todo_list

    def execute(self) -> dict:
        """
        阶段2: 执行 —— 按依赖顺序执行TODO列表
        """
        while self._has_pending_tasks():
            # 找到下一个可执行的任务
            task = self._get_next_executable_task()
            if not task:
                break

            task["status"] = TaskStatus.IN_PROGRESS.value

            # 构建执行上下文(关键:只包含相关信息)
            context = self._build_task_context(task)

            # 执行任务
            result = self._execute_task(task, context)

            # 更新状态
            task["status"] = TaskStatus.COMPLETED.value
            task["result"] = result
            self.completed_results[task["id"]] = result

            # 记录执行日志
            self.execution_log.append({
                "task_id": task["id"],
                "title": task["title"],
                "result_summary": result[:200] if result else ""
            })

        return self._synthesize_results()

    def _build_task_context(self, task: dict) -> str:
        """
        关键方法:为当前任务构建精准的上下文。
        只包含:
        1. 任务自身的信息
        2. 已完成的依赖任务的结果
        3. 整体计划的摘要(不是全部细节)
        """
        context_parts = []

        # 1. 整体计划概览(压缩版)
        plan_summary = "\n".join(
            f"  [{t['status']}] {t['title']}" for t in self.todo_list
        )
        context_parts.append(f"## Overall Plan\n{plan_summary}")

        # 2. 依赖任务的结果(关键信息)
        for dep_id in task.get("dependencies", []):
            dep_result = self.completed_results.get(dep_id)
            if dep_result:
                dep_task = self.todo_list[dep_id]
                context_parts.append(
                    f"## Dependency Result: {dep_task['title']}\n"
                    f"{dep_result[:500]}"  # 限制长度
                )

        # 3. 当前任务详情
        context_parts.append(
            f"## Current Task\n"
            f"Title: {task['title']}\n"
            f"Intent: {task['intent']}\n"
            f"Query: {task['query']}"
        )

        return "\n\n".join(context_parts)

    def _get_next_executable_task(self) -> dict | None:
        """找到依赖已满足的下一个待执行任务"""
        for task in self.todo_list:
            if task["status"] != TaskStatus.PENDING.value:
                continue
            deps = task.get("dependencies", [])
            if all(
                self.todo_list[d]["status"] == TaskStatus.COMPLETED.value
                for d in deps
            ):
                return task
        return None

    def _has_pending_tasks(self) -> bool:
        return any(
            t["status"] in (TaskStatus.PENDING.value, TaskStatus.IN_PROGRESS.value)
            for t in self.todo_list
        )

    def _execute_task(self, task: dict, context: str) -> str:
        """执行单个任务"""
        prompt = f"""
        {context}

        Execute the current task and provide a comprehensive result.
        Use available tools if needed.
        """
        return self.llm.chat(prompt)

    def _synthesize_results(self) -> dict:
        """阶段3: 综合 —— 将所有结果整合为最终报告"""
        all_results = "\n\n".join(
            f"### {t['title']}\n{t.get('result', 'N/A')[:300]}"
            for t in self.todo_list
        )

        synthesis_prompt = f"""
        Based on the following research results, synthesize a comprehensive final report:

        {all_results}

        Structure:
        1. Executive Summary
        2. Key Findings
        3. Detailed Analysis
        4. Recommendations
        """

        final_report = self.llm.chat(synthesis_prompt)
        return {
            "report": final_report,
            "tasks": self.todo_list,
            "execution_log": self.execution_log
        }

8.4.2 Scratchpad模式

另一种管理长程上下文的方式是给Agent一个"草稿纸"(Scratchpad)——Agent可以在每一步中写下关键发现,而不需要在上下文中保留所有历史细节。

Python
class ScratchpadAgent:
    """
    Scratchpad模式的Agent
    Agent在每一步中将关键信息写入scratchpad,
    下一步只需要读取scratchpad而非全部历史。
    """

    def __init__(self, llm_client):
        self.llm = llm_client
        self.scratchpad = []  # 结构化的笔记

    def step(self, observation: str) -> str:
        """执行一步推理"""
        # 构建上下文:只包含scratchpad + 当前观察
        context = self._format_scratchpad()

        prompt = f"""
        ## Your Notes (from previous steps)
        {context}

        ## Current Observation
        {observation}

        ## Instructions
        1. Analyze the current observation
        2. Update your notes with any new key findings (use <note> tags)
        3. Decide on the next action

        Format:
        <note>Key finding or decision to remember</note>
        <action>Next action to take</action>
        """

        response = self.llm.chat(prompt)

        # 提取并保存笔记
        notes = self._extract_notes(response)
        self.scratchpad.extend(notes)

        # 定期压缩scratchpad
        if len(self.scratchpad) > 20:
            self._compress_scratchpad()

        return self._extract_action(response)

    def _format_scratchpad(self) -> str:
        if not self.scratchpad:
            return "(No notes yet)"
        return "\n".join(f"- {note}" for note in self.scratchpad)

    def _extract_notes(self, response: str) -> list[str]:
        import re
        return re.findall(r'<note>(.*?)</note>', response, re.DOTALL)

    def _extract_action(self, response: str) -> str:
        import re
        match = re.search(r'<action>(.*?)</action>', response, re.DOTALL)
        return match.group(1).strip() if match else response

    def _compress_scratchpad(self):
        """压缩scratchpad中的旧笔记"""
        old_notes = self.scratchpad[:15]
        recent_notes = self.scratchpad[15:]

        summary = self.llm.chat(
            f"Compress these notes into 3-5 key points:\n" +
            "\n".join(f"- {n}" for n in old_notes)
        )

        self.scratchpad = [f"[Summary of earlier notes]: {summary}"] + recent_notes

8.5 实战:代码库维护Agent

让我们构建一个完整的代码库维护Agent,它需要分析整个项目的代码结构、找到潜在问题、提出改进建议。这是一个典型的长程Agent场景,需要精心的上下文管理。

Python
"""
代码库维护Agent - 完整实现
演示上下文工程在实际Agent中的应用
"""

import os
import json
from pathlib import Path
from dataclasses import dataclass, field
from typing import Optional

@dataclass
class FileInfo:
    """文件信息"""
    path: str
    language: str
    lines: int
    functions: list[str] = field(default_factory=list)
    classes: list[str] = field(default_factory=list)
    imports: list[str] = field(default_factory=list)
    todos: list[str] = field(default_factory=list)

class CodebaseMaintenanceAgent:
    """
    代码库维护Agent
    展示上下文工程的最佳实践
    """

    SYSTEM_PROMPT = """You are a senior software engineer conducting a codebase
    maintenance review. You analyze code structure, identify issues, and suggest
    improvements.

    Available tools:
    - list_files(directory): List files in a directory
    - read_file(path): Read a file's content
    - search_code(pattern): Search for code patterns
    - count_lines(path): Count lines of code

    Guidelines:
    - Focus on architectural and structural issues, not style nitpicks
    - Prioritize findings by impact: Critical > High > Medium > Low
    - Always provide actionable recommendations
    - Use your scratchpad to track findings across files
    """

    def __init__(self, project_root: str, llm_client):
        self.project_root = Path(project_root)
        self.llm = llm_client

        # 上下文管理组件
        self.budget_manager = ContextBudgetManager(max_tokens=128000)
        self.history_manager = SummaryCompressHistory(
            summarizer=lambda text: llm_client.chat(
                f"Summarize concisely: {text}"
            )
        )
        self.scratchpad = []

        # 项目知识
        self.project_map = {}  # 项目结构缓存
        self.analyzed_files = set()

    def run(self, query: str) -> str:
        """
        处理一个维护查询
        """
        # Step 1: 确保有项目地图
        if not self.project_map:
            self._build_project_map()

        # Step 2: 构建上下文
        context = self._build_context(query)

        # Step 3: 多轮推理
        max_steps = 10
        for step in range(max_steps):
            response = self.llm.chat(context)

            # 检查是否有工具调用
            tool_call = self._parse_tool_call(response)
            if tool_call:
                result = self._execute_tool(tool_call)
                # 压缩工具结果后加入上下文
                compressed = ToolResultCompressor().compress_file_content(
                    result, query
                )
                context += f"\n\nTool Result ({tool_call['name']}):\n{compressed}"

                # 更新scratchpad
                note = self._extract_finding(response)
                if note:
                    self.scratchpad.append(note)
            else:
                # Agent给出了最终回答
                return response

        return "Analysis complete. " + self._format_scratchpad()

    def _build_project_map(self):
        """
        构建项目结构地图(轻量级,不读取文件内容)
        这个地图会被缓存,不占用上下文预算
        """
        for root, dirs, files in os.walk(self.project_root):
            # 跳过常见的非代码目录
            # dirs[:]就地修改列表:通知os.walk跳过指定目录。若直接dirs=[...]只会创建新变量不影响遍历
            dirs[:] = [d for d in dirs
                       if d not in {'node_modules', '.git', '__pycache__',
                                    'venv', '.venv', 'dist', 'build'}]

            for f in files:
                if f.endswith(('.py', '.js', '.ts', '.go', '.rs', '.java')):
                    full_path = Path(root) / f
                    rel_path = full_path.relative_to(self.project_root)

                    # with语句确保文件句柄在计数完成后正确关闭
                    with open(full_path, encoding='utf-8', errors='ignore') as fh:
                        line_count = sum(1 for _ in fh)  # 生成器逐行计数,比readlines()更省内存

                    self.project_map[str(rel_path)] = FileInfo(
                        path=str(rel_path),
                        language=full_path.suffix,
                        lines=line_count
                    )

    def _build_context(self, query: str) -> str:
        """
        精心构建上下文(上下文工程的核心)
        """
        parts = []

        # Layer 1: System Prompt(固定,约2000 tokens)
        parts.append(self.SYSTEM_PROMPT)

        # Layer 2: 项目概览(压缩版,约500 tokens)
        project_summary = self._get_project_summary()
        parts.append(f"\n## Project Overview\n{project_summary}")

        # Layer 3: Scratchpad(之前的发现,约1000 tokens)
        if self.scratchpad:
            notes = "\n".join(f"- {n}" for n in self.scratchpad[-10:])
            parts.append(f"\n## Your Previous Findings\n{notes}")

        # Layer 4: 历史摘要(约500 tokens)
        history = self.history_manager.get_context()
        if history:
            parts.append(f"\n## Conversation Context")
            for h in history:
                parts.append(f"{h['role']}: {h['content'][:200]}")

        # Layer 5: 当前查询
        parts.append(f"\n## Current Query\n{query}")

        return "\n".join(parts)

    def _get_project_summary(self) -> str:
        """项目结构的压缩摘要"""
        lang_stats = {}
        total_files = 0
        total_lines = 0

        for f_info in self.project_map.values():
            lang = f_info.language
            lang_stats.setdefault(lang, {"files": 0, "lines": 0})  # setdefault:语言首次出现时初始化统计字典
            lang_stats[lang]["files"] += 1
            lang_stats[lang]["lines"] += f_info.lines
            total_files += 1
            total_lines += f_info.lines

        summary = f"Total: {total_files} files, {total_lines:,} lines\n"
        for lang, stats in sorted(lang_stats.items()):
            summary += f"  {lang}: {stats['files']} files, {stats['lines']:,} lines\n"

        # 只列出顶层目录结构
        top_dirs = set()
        for path in self.project_map:
            parts = Path(path).parts
            if len(parts) > 1:
                top_dirs.add(parts[0])

        summary += f"Top-level directories: {', '.join(sorted(top_dirs))}"

        return summary

    def _parse_tool_call(self, response: str) -> Optional[dict]:
        """解析Agent响应中的工具调用"""
        import re
        match = re.search(r'<tool>(.*?)</tool>', response, re.DOTALL)
        if match:
            try:
                return json.loads(match.group(1))
            except json.JSONDecodeError:
                return None
        return None

    def _execute_tool(self, tool_call: dict) -> str:
        """执行工具调用"""
        name = tool_call.get("name")
        args = tool_call.get("args", {})

        if name == "read_file":
            path = self.project_root / args["path"]
            return path.read_text(errors='ignore')
        elif name == "list_files":
            directory = self.project_root / args.get("directory", "")
            return "\n".join(
                str(p.relative_to(self.project_root))
                for p in directory.iterdir() if p.is_file()
            )
        elif name == "search_code":
            return self._search_code(args["pattern"])
        else:
            return f"Unknown tool: {name}"

    def _search_code(self, pattern: str) -> str:
        import re
        results = []
        for rel_path in self.project_map:
            full_path = self.project_root / rel_path
            try:
                content = full_path.read_text(errors='ignore')
                for i, line in enumerate(content.split('\n'), 1):
                    if re.search(pattern, line, re.IGNORECASE):
                        results.append(f"{rel_path}:{i}: {line.strip()}")
            except Exception:
                pass
        return "\n".join(results[:50])  # 限制结果数

    def _extract_finding(self, response: str) -> Optional[str]:
        import re
        match = re.search(r'<finding>(.*?)</finding>', response, re.DOTALL)
        return match.group(1).strip() if match else None

    def _format_scratchpad(self) -> str:
        return "\n".join(f"- {n}" for n in self.scratchpad)

8.6 最佳实践总结

上下文工程清单

# 实践 说明
1 预算先行 在设计Agent前就规划上下文预算分配
2 分层组织 System/Knowledge/History/Current 四层架构
3 压缩工具结果 永远不要把原始工具输出全部塞入上下文
4 位置感知 关键信息放开头或结尾,避免Lost in the Middle
5 TODO驱动 长程Agent用结构化任务列表防止迷失
6 Scratchpad 让Agent用笔记代替完整历史
7 分层记忆 工作记忆→短期→长期的三级记忆架构
8 动态检索 按需从知识库检索,而非预加载全部
9 成本监控 监控每步的token消耗,设置上限
10 锚点标记 用XML标签包裹关键信息帮助LLM聚焦

与hello-agents的对比优势

本章内容相比datawhalechina/hello-agents的第九章"上下文工程",增加了以下独特价值:

  1. ContextBudgetManager完整实现:可直接应用的预算管理系统
  2. 三种HistoryManager策略对比:滑动窗口/摘要压缩/分层记忆的完整代码
  3. ToolResultCompressor:实用的工具结果压缩方案(hello-agents未涉及)
  4. Lost in the Middle缓解方案:PositionAwareContextBuilder的实用代码
  5. TODO驱动的Deep Research范式:比hello-agents的介绍更完整的可运行代码
  6. 代码库维护Agent完整案例:将所有上下文工程技术整合在一个Agent中

📝 练习

练习1:实现Token预算管理器(基础)

实现一个TokenBudgetManager,支持: - 设置总预算和各层预算 - 添加上下文项时自动检查预算 - 当超预算时,自动淘汰最低优先级的项

练习2:实现Scratchpad Agent(中级)

基于ScratchpadAgent模板,实现一个能够: - 浏览网页收集信息 - 在scratchpad中记录关键发现 - 当scratchpad过长时自动压缩 - 最终综合scratchpad生成报告

练习3:设计上下文压缩策略(高级)

设计一个通用的上下文压缩框架,支持: - 多种压缩策略(摘要、截断、抽取、量化) - 按信息类型选择压缩策略 - A/B测试不同压缩策略对Agent性能的影响


📚 参考资料

  1. Andrej Karpathy: "Context Engineering" (2025) — 上下文工程概念的提出
  2. Liu et al.: "Lost in the Middle: How Language Models Use Long Contexts" (2023)
  3. Park et al.: "Generative Agents: Interactive Simulacra of Human Behavior" (2023)
  4. OpenAI: "Memory & Context Management in Agents" (2025)
  5. LangChain Docs: "Conversation Memory" — 对话记忆实现参考
  6. Anthropic: "Building Effective Agents" (2025) — Claude的Agent上下文最佳实践

📝 本章小结

本章系统学习了上下文工程(Context Engineering)的核心知识:

  1. ✅ 理解了从Prompt Engineering到Context Engineering的演进
  2. ✅ 掌握了上下文窗口四层架构和信息密度优化原则
  3. ✅ 学会了工具结果压缩与Lost in the Middle缓解策略
  4. ✅ 掌握了三种对话历史管理策略(滑动窗口/摘要压缩/分层记忆)
  5. ✅ 学会了TODO驱动和Scratchpad两种长程Agent上下文管理模式
  6. ✅ 完成了代码库维护Agent实战案例

✅ 学习检查清单

  • 能解释Prompt Engineering和Context Engineering的核心区别
  • 能实现上下文预算管理器(ContextBudgetManager)
  • 能实现工具结果压缩器(ToolResultCompressor)
  • 理解Lost in the Middle问题及缓解策略
  • 能实现三种对话历史管理策略
  • 能设计TODO驱动的长程Agent
  • 能使用Scratchpad模式管理Agent上下文
  • 能将上下文工程技术整合到实际Agent项目中

🔗 下一步

下一章我们将学习Agent强化学习,探索如何通过奖励信号优化Agent的决策能力。

继续学习: 09-Agent强化学习


祝你学习愉快! 🎉


最后更新日期:2026-02-12 适用版本:AI Agent开发实战教程 v2026