Claude Code Context Compact

2026-05-04 18:58:40

Claude Code Context Compact

上下文过大对模型输出的影响

上下文过大 -> 注意力分散 -> token权重被稀释 -> 输出可能错误/不相关的答案

具体示例：

1	问：爱因斯坦是哪国人？

模型端计算公式：P(下一个token∣当前上下文)

当注意力正常时，模型会终点关注”哪国人”

模型端第一步：预测第一个token (所有token权重占比的值的和等于1)

P(德国) = 0.65
P(瑞士) = 0.2
P(美国) = 0.05
P(科学家) = 0.03
......

德国权重占比最高，所以首先输出 “德国”

再预测 -> P(人 | 德国) 的token权重占比最高，所以输出答案为 “德国人”

‍

若上下文过大，再次提出问题：

1 2	[上下文.......] 问：爱因斯坦是哪国人？

由于模型注意力总和为1 (softmax归一化)，上下文越大 -> token 越多 -> 每个 token 的权重都在下降

“哪国人” 这种 token 的权重下降，模型进而可能更关注 “爱因斯坦” 而不是 “哪国人”

进而预测的第一个token的权重占比可能会变成：

P(德国) = 0.35
P(瑞士) = 0.28
P(美国) = 0.12
P(科学家) = 0.10
P(法国) = 0.08
P(哲学家) = 0.04
...

进而可能导致输出错误答案出现幻觉

‍

Claude Code 上下文压缩机制分析

整体调用：

flowchart TD
  A[新一轮 query] --> B[getMessagesAfterCompactBoundary]
  B --> C[snip / microcompact / context collapse]
  C --> D{是否触发 auto compact?}
  D -- 否 --> E[直接继续请求模型]
  D -- 是 --> F[先试 session memory compact]
  F -- 成功 --> G[返回 CompactionResult]
  F -- 失败 --> H[compactConversation]
  G --> I[buildPostCompactMessages]
  H --> I
  I --> J[runPostCompactCleanup / markPostCompaction]
  J --> K[下一轮 query 继续]

‍

microCompact

分 Time-Based microCompact，Cache-Based microCompact 两种，具体调用流程如下图所示：

若满足 time-based 条件，则代表 prompt cache 已经过期 -> 直接对上下文中的旧工具调用输出进行压缩处理

若 prompt cache 没过期 -> 找出要从 prompt cache 里删除的旧工具的 tool_result，然后生成 cache_edits，交给 API 层执行

flowchart TD
    A[进入 microcompactMessages] --> B[清理 compact warning 抑制状态]

    B --> C[evaluateTimeBasedTrigger
判断是否触发时间型压缩]
    C --> D{满足 time-based 条件?}

    D -- 是 --> E[认为 prompt cache 已过期 / cold]
    E --> F[collectCompactableToolIds
收集可压缩 tool_result]
    F --> G[保留最近 N 个 tool_result
至少保留 1 个]
    G --> H[清空更早的 tool_result 内容
替换为 TIME_BASED_MC_CLEARED_MESSAGE]
    H --> I[记录 tokensSaved / 日志]
    I --> J[resetMicrocompactState
清理 cached-MC 旧状态]
    J --> K[notifyCacheDeletion
告知 cache 命中下降是预期行为]
    K --> L[返回压缩后的 messages
跳过 cached MC]

    D -- 否 --> M{CACHED_MICROCOMPACT 开启?}

    M -- 否 --> Z[不做 microcompact
原样返回 messages]

    M -- 是 --> N[加载 cached MC 模块
获取当前模型]
    N --> O{满足 cached MC 条件?}

    O -- 是 --> P[cachedMicrocompactPath
编辑 prompt cache]
    P --> Q[返回 cached MC 结果]

    O -- 否 --> Z

    L --> END[结束]
    Q --> END
    Z --> END

    C -. 条件包括 .-> C1[功能开启]
    C -. 条件包括 .-> C2[querySource 存在且是 main thread]
    C -. 条件包括 .-> C3[距离上次 assistant 回复超过阈值]

    O -. 条件包括 .-> O1[模块启用]
    O -. 条件包括 .-> O2[模型支持 cache editing]
    O -. 条件包括 .-> O3[querySource 是 main thread]

‍

session memory Compact

sequenceDiagram
  participant Setup as setup.ts
  participant Init as initSessionMemory()
  participant Hook as extractSessionMemory
  participant Gate as feature gate
  participant SM as shouldExtractMemory()
  participant FS as setupSessionMemoryFile()
  participant Prompt as buildSessionMemoryUpdatePrompt()
  participant Fork as runForkedAgent()
  participant Tool as FileEditTool

  Setup->>Init: 启动时调用
  Init->>Init: if !isAutoCompactEnabled() return
  Init->>Init: registerPostSamplingHook(extractSessionMemory)

  Note over Hook: 每次模型采样后触发
  Hook->>Gate: isSessionMemoryGateEnabled()
  alt gate 关闭
    Hook-->>Hook: return
  else gate 打开
    Hook->>SM: shouldExtractMemory(messages)
    alt 不满足阈值
      SM-->>Hook: false
    else 满足阈值
      Hook->>Hook: markExtractionStarted()
      Hook->>FS: create dir + summary.md + read current content
      Hook->>Prompt: 构造更新提示
      Hook->>Fork: 用 forked agent 更新笔记
      Fork->>Tool: 只允许 Edit summary.md
      Fork-->>Hook: 完成
      Hook->>Hook: recordExtractionTokenCount()
      Hook->>Hook: updateLastSummarizedMessageIdIfSafe()
      Hook->>Hook: markExtractionCompleted()
    end
  end

summary.md 保存在 {projectDir}/{sessionId}/session-memory/summary.md

summary.md 的作用是一个”滚动会话笔记/长期摘要缓存”。session memory extraction 在对话进行中不断把重要上下文写进去 (需要达到阈值，对话较短不会写 summary)，例如当前任务、文件、错误修正、关键结果、工作流等。等 autoCompact 真的触发时，Claude Code无需通过调用fork agent重新总结整段历史，而是把 summary.md 内容拿出来，当作 compact 后的新 summary message。

但是目前限制 USER_TYPE=ant 才能使用这个功能，可以自己通过源代码构建

‍

autoCompact

autoCompactIfNeeded -> 首先尝试进行 Session Memory Compact (若失败) -> compactConversation 函数

compactConversation 函数本质是调用模型对历史上下文生成一段摘要，但其中也有很多细节需要去发掘

构建 Compact Prompt - 这是一个可以参考的设计

const NO_TOOLS_PREAMBLE = `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.

- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- You already have all the context you need in the conversation above.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.
- Your entire response must be plain text: an <analysis> block followed by a <summary> block.`

const BASE_COMPACT_PROMPT = `Your task is to create a detailed summary of the conversation so far, paying close attention to the user's explicit requests and your previous actions.
This summary should be thorough in capturing technical details, code patterns, and architectural decisions that would be essential for continuing development work without losing context.
// 你的任务是创建到目前为止对话的详细摘要，密切关注用户的明确请求和你之前的操作。此摘要应全面捕捉技术细节、代码模式和架构决策，这些对于继续开发工作至关重要，而不会失去上下文


export function getCompactPrompt(customInstructions?: string): string {
  let prompt = NO_TOOLS_PREAMBLE + BASE_COMPACT_PROMPT

  if (customInstructions && customInstructions.trim() !== '') {
    prompt += `\n\nAdditional Instructions:\n${customInstructions}`
  }

  prompt += NO_TOOLS_TRAILER

  return prompt
}

这个 prompt 有三个关键设计：

NO_TOOLS_PREAMBLE 放最前面，强制模型只输出文本，不许调用工具。
BASE_COMPACT_PROMPT 要求总结用户请求、技术概念、文件和代码、错误修复、当前工作、下一步。
NO_TOOLS_TRAILER 在末尾再提醒一次不要调用工具。

‍

compactConversation 中生成上下文摘要的逻辑在 streamCompactSummary 函数中

let messagesToSummarize = messages
let retryCacheSafeParams = cacheSafeParams
let summaryResponse: AssistantMessage
let summary: string | null
let ptlAttempts = 0
for (;;) {
  summaryResponse = await streamCompactSummary({
    messages: messagesToSummarize,
    summaryRequest,
    appState,
    context,
    preCompactTokenCount,
    cacheSafeParams: retryCacheSafeParams,
  })
  summary = getAssistantMessageText(summaryResponse)
  if (!summary?.startsWith(PROMPT_TOO_LONG_ERROR_MESSAGE)) break
  ...
}

默认用 fork agent 对上下文做摘要，因为 forked agent 可以复用主会话的 prompt cache，不用重新把同样的大段上下文作为全新 prompt 创建 cache，成本更低

createCompactCanUseTool() 函数再次拒绝调用工具，因为只在 prompt 层做约束不够保险，再在权限层做一层

maxTurns 值为 1 -> compact agent 只应该回答一次，即生成的上下文摘要，不应该进入工具调用/多轮交互

const result = await runForkedAgent({
  promptMessages: [summaryRequest],
  cacheSafeParams,
  canUseTool: createCompactCanUseTool(),
  querySource: 'compact',
  forkLabel: 'compact',
  maxTurns: 1,
  skipCacheWrite: true,
  overrides: { abortController: context.abortController },
})

export function createCompactCanUseTool(): CanUseToolFn {
  return async () => ({
    behavior: 'deny' as const,
    message: 'Tool use is not allowed during compaction',
    decisionReason: {
      type: 'other' as const,
      reason: 'compaction agent should only produce text summary',
    },
  })

若 fork agent 不行，再用 streaming fallback 兜底

具体调用流程如下图所示：

‍

flowchart TD
  A[进入 compactConversation] --> B{messages 为空?}
  B -- 是 --> B1[抛出 not enough messages]
  B -- 否 --> C[统计压缩前 token]
  C --> D[执行 PreCompact hooks]
  D --> E[合并 hook 指令和自定义指令]
  E --> F[构造 compact prompt]
  F --> G[调用 streamCompactSummary]
  G --> H{摘要是否为 Prompt Too Long?}
  H -- 是 --> I[裁剪旧消息并重试]
  I --> G
  H -- 否 --> J[校验摘要内容]
  J --> K[清理压缩前缓存]
  K --> L[重建 attachments]
  L --> M[运行 SessionStart hooks]
  M --> N[创建 boundaryMarker]
  N --> O[包装 summary 为 user message]
  O --> P[计算压缩后 token]
  P --> Q[记录 telemetry 和状态]
  Q --> R[执行 PostCompact hooks]
  R --> S[返回 CompactionResult]

  G -. 简述 .-> G1[优先 forked agent
失败后走 streaming fallback]

‍

总结

通过阅读 Claude Code 上下文压缩机制的源代码，有以下收获：
(1) microCompact 机制分为直接对 prompt 的历史旧工具输出进行占位符替换式删除 (prompt cache 过期)，以及在 prompt cache 未过期时直接通过 API 编辑缓存

(2) session memory Compact 机制在 autoCompact 前执行，直接用 claude code 实时整理的摘要文件作为上下文摘要，不用再 fork agent 去进行总结，节省大量时间和token

(3) autoCompact - 最经典的压缩机制，fork agent 总结上下文摘要，后续将摘要继续作为下一轮对话的 prompt

三种思路都有相当多的可借鉴之处，受益匪浅

‍

2026-05-04 18:58:40