智能体架构指南 | Claude Code 逆向工程蓝图

What Is an Agentic System?

An AI that pursues goals autonomously — perceiving, deciding, acting, and learning from results.

Goal-Oriented

Works toward completing tasks, not just answering questions. Breaks complex goals into steps and executes them autonomously.

Tool-Using

Interacts with external systems: executes code, reads/writes files, searches the web, calls APIs, and more.

Context-Aware

Maintains conversation history, remembers past interactions, and adapts behavior based on accumulated context.

Iterative

Works in loops: observes → thinks → acts → observes results → decides next. Continues until the goal is achieved.

The Agent Loop — Heart of Every Agentic System

Perceive

→

Think

→

Act

→

Observe

→

Decide Next

This loop repeats until the task is complete or the agent decides to stop

Why Claude Code?

Claude Code demonstrates these principles in production: multi-file codebases, long conversations, tool errors, and user collaboration. The patterns apply to any agentic system.

Core Concepts

The six building blocks every agentic system needs.

1. The Query Loop (Agent Loop)

A continuous loop: send messages to AI → receive response → execute tools → feed results back → repeat. Without a loop, you have a chatbot.

Why it matters

Claude Code's loop runs in query.ts with exit conditions: no tool use, max turns reached, budget exceeded, or user abort. Every agentic system needs a similar loop with clear termination conditions.

2. Tool System

Defined actions the agent can take: read files, execute commands, search the web. Each tool has a name, description (for the AI), input schema, and execution logic.

Why it matters

Tools are how the agent interacts with the world. Without tools, it can only generate text. Well-described tools lead to better agent decisions.

3. Context Management

Managing what the agent knows: conversation history, file contents, tool results. AI models have finite context windows.

Why it matters

Claude Code uses auto-compaction (summarizing old messages), micro-compaction (removing redundant results), and selective attachment injection to stay within limits.

4. Permission System

Controls what the agent can and cannot do. Prevents dangerous actions, requires user approval for sensitive operations.

Why it matters

Autonomous agents can cause real damage. Without permissions, agents are unsafe for production. Multiple layers ensure defense in depth.

5. Streaming Execution

Starting tool execution while the AI is still generating its response. Tools begin as soon as their input is complete.

Why it matters

Reduces latency by 30-50%. Users see progress immediately. Critical for long-running tasks.

6. Session Persistence

Saving conversation state so it can be resumed later. Includes message history, file states, and execution context.

Why it matters

Claude Code records transcripts to JSONL files and replays them. Persistence enables recovery; strategic discard enables scalability.

Architecture Patterns

Proven patterns from Claude Code that apply to any agentic system.

Pattern 1: Layered Architecture

Separate concerns into distinct layers. Each layer has a specific responsibility and communicates through well-defined interfaces.

UI Layer (React/Ink)

Runtime (Query Loop, Streaming)

API Layer (SDK, Retry, Fallback)

Tool System (Bash, File, Web, Agent...)

Infrastructure (Permissions, Context, Session)

Why this works

Each layer evolves independently. Swap the UI without changing the runtime. Add tools without touching the API layer.

Pattern 2: Streaming Pipeline

Process data as it arrives. The AI streams tokens, tools start as soon as input is complete.

API Stream

→

Parse Tool Use

→

Start Tool

→

Yield Result

Why this works

Reduces perceived latency by 30-50%. Users see progress immediately.

Pattern 3: Context Window Management

AI models have finite context windows. Use compaction strategies to preserve important context.

Strategy 1: Summarization

Use a smaller model to summarize old conversation turns.

Strategy 2: Selective Injection

Inject only relevant memories, skills, or file contents based on the current task.

Strategy 3: Micro-Compaction

Remove redundant tool results, collapse similar search outputs.

Pattern 4: Multi-Layer Permissions

Defense in depth for safety. Multiple layers ensure that even if one fails, others catch dangerous actions.

Layer 1: Static Deny Rules

Layer 2: Auto-mode Classifier

Layer 3: Hook-based Rules

Layer 4: User Prompt

Warning

Never rely on a single permission layer. Multiple layers ensure defense in depth.

Pattern 5: Error Recovery

Agents will encounter errors. Build recovery paths, not just error messages.

Recovery 1: Retry with Backoff

API rate limits? Retry with exponential backoff.

Recovery 2: Model Fallback

Primary model overloaded? Switch to fallback automatically.

Recovery 3: Context Compaction

Prompt too long? Compact context and retry.

Pattern 6: Composable Sub-Agents

Spawn sub-agents for parallel work. Each runs its own loop with a filtered tool set.

Parent Agent

→

Spawn Sub-Agent

→

Filtered Tools

→

Parallel Execution

→

Results Back

Why this works

Enables parallel work without duplicating the core loop. Sub-agents get restricted tool sets.

Interactive Workflows

See how data flows through Claude Code's architecture. Click any workflow to visualize the path.

Design Principles

The why behind the architecture decisions.

1. Bias Toward Action
Don't wait for perfect information. Start executing as soon as you have enough context.
2. Fail Gracefully, Recover Automatically
Rate limited? Retry. Model overloaded? Switch. Context too long? Compact. Build resilience.
3. Optimize for Perceived Performance
Streaming makes 10-second responses feel like 2 seconds. Parallel execution makes sequential tasks feel instant.
4. Separate Policy from Mechanism
Tool system provides mechanisms. Permission system provides policy. Keep them separate.
5. Make the Invisible Visible
Show what the agent is thinking, executing, and why. Users trust systems they can understand.
6. Design for the Long Tail
Handle edge cases: prompt too long, max output tokens, model fallback, permission edge cases.
7. Compose, Don't Duplicate
Sub-agents, plugins, MCP servers. Build composable systems, not monoliths.
8. Persist Everything, Discard Strategically
Record all conversations. But also compact old context. Persistence enables recovery; selective discard enables scalability.

Build Your Own Agentic System

Step-by-step implementation guide for any product team.

Define Purpose and Boundaries
Before writing code: What tasks will this agent perform? What tools? What should it never do? Define yours first.
Build the Core Query Loop
Start minimal: send message → get response → execute tools → feed results back → repeat. Claude Code's core loop is ~100 lines.
Implement Your First Tools
Start with 2-3 essential tools. Define: name, description, input schema, execution logic. Test in isolation.
Add Basic Permissions
Whitelist allowed commands, file paths the agent can access, and ask user fallback for anything uncertain.
Implement Context Management
Track context window usage. At 80% capacity, trigger compaction. Start simple: summarize oldest messages.
Add Streaming
Stream AI responses token-by-token. Start tool execution before full response is complete. Biggest UX win.
Build Error Recovery
For each error type, define a recovery strategy. Don't just show errors; recover from them.
Add Session Persistence
Record conversations to disk. Save file states. Enable resume. Users should pick up where they left off.
Iterate on Tool Quality
The AI chooses tools based on descriptions. Make them clear and specific. Bad descriptions = bad decisions.
Test Edge Cases Relentlessly
Test with: very long conversations, large operations, network failures, API timeouts, permission denials.

Start with an MVP

Steps 1-4: functional agent. Steps 5-7: production-ready. Steps 8-10: robust. Ship early, iterate.

Technical Specification

Implementation contracts and detailed requirements from production audit.

1. Loop Exit Conditions (13 total)

The agent loop must handle these exit conditions. MVP requires 1, 3, 4, 6, 8, 10. Production requires all 13.

Terminal Exits (no recovery)

blocking_limit

Token count at blocking limit

completed

No tool_use blocks in response

aborted

User abort or signal abort

max_turns

Turn count exceeded

Recoverable Exits (continue loop)

model_fallback

Switch model, continue

reactive_compact

413 error, compact and retry

max_output_recovery

Output limit hit, recover up to 3x

2. Message Construction Pipeline

Messages go through 6 transformations before each API call. Order matters.

1. Strip before compact boundary

2. Apply tool result budget

3. Snip compact (old messages)

4. Microcompact (inline compression)

5. Context collapse projection

6. Auto-compact (summarization)

Critical

Tool result budget MUST run before compaction. Microcompact MUST run before auto-compact. Wrong order causes compaction self-PTL.

3. Error Recovery Cascade

Each error type has a specific recovery path.

413 (Prompt Too Long)

Step 1

Context collapse drain → retry

Step 2

Reactive compact (strip media) → retry once

Step 3

Surface error → return prompt_too_long

Max Output Tokens

Step 1

Escalate to 64k (once)

Step 2

Inject recovery message → retry (up to 3x)

Step 3

Surface error

529 (Overloaded)

Step 1

Exponential backoff (up to 3x)

Step 2

Switch to fallback model

4. Tool Interface (30+ methods)

Minimum viable subset for rebuild (7 required methods).

Required

name - Tool identity
inputSchema - Zod validation
call() - Execution logic
checkPermissions() - Permission gate
validateInput() - Pre-permission validation
prompt() - System prompt description
mapToolResultToToolResultBlockParam() - Result serialization

Optional

backfillObservableInput() - Input normalization
toAutoClassifierInput() - Auto-mode classification
All render*() methods - UI only (stub for headless)

Critical

Without validateInput(), stale file edits cause data corruption. Without isConcurrencySafe(), streaming executor deadlocks.

5. Four-Layer Permission Architecture

Defense in depth. Each layer can deny independently.

Layer 1: Static Deny Rules (tool assembly time)

Layer 2: Tool-Specific checkPermissions() (allow | ask | deny | passthrough)

Layer 3: Hook-Based Rules (if/then pattern matching)

Layer 4: Interactive Prompt / Auto-Classifier / Auto-Deny

Permission Modes

default

Ask for sensitive, auto-allow safe

plan

Read-only tools only

auto

AI classifier decides

acceptEdits

Auto-allow file edits

6. Five-Layer Compaction Pipeline

Each layer handles different scenarios. All needed for production.

Layer 1: Snip

Truncates old messages for SDK/long sessions. Replay removes zombie messages.

Layer 2: Micro

Truncates individual tool results. Collapses search/read patterns. Tracks cache_deleted_input_tokens.

Layer 3: Auto

Threshold: effectiveContextWindow - 13K tokens. Calls Haiku to summarize. Circuit breaker: max 3 failures.

Layer 4: Reactive

Triggered AFTER 413 error. Strips oversized media first. ONE-shot only (prevents spiral).

Layer 5: Context Collapse

Projects collapsed view over full history. Commits persist across turns. Experimental.

Critical: Compaction Self-PTL

Compaction itself can exceed context. Must handle with truncateHeadForPTLRetry() (max 3 retries). Strips images before compact.

7. Streaming Tool Execution Model

Tools start executing while model is still streaming. Reduces latency 30-50%.

Execution Logic

concurrencySafe = true

Start immediately (parallel execution)

concurrencySafe = false

Queue for sequential execution

Abort handling

Generate synthetic error results for pending tools

8. State Tracking (Per-Query)

Critical state variables that prevent infinite loops and track progress.

Essential

messages - Full conversation history
turnCount - Turn counter
autoCompactTracking - Compact state (turnId, consecutiveFailures)
hasAttemptedReactiveCompact - Prevents spiral (tried once)

Recovery

maxOutputTokensRecoveryCount - Recovery attempts (limit: 3)
maxOutputTokensOverride - Escalated token cap (64k)
taskBudgetRemaining - Budget across compacts

Advanced

pendingToolUseSummary - Haiku summary from previous turn
stopHookActive - Stop hook processing
transition - Why previous iteration continued

Critical

Without hasAttemptedReactiveCompact, a 413 → reactive compact → 413 → reactive compact loop becomes infinite.

9. Production Roadmap

Tiered approach from MVP to full parity.

Tier 1: MVP (2-4 weeks)

Core loop with 6 exit conditions, 3 core tools, basic permissions, simple context, API client with retry, session persistence.

Tier 2: Production-Ready (4-8 weeks)

Full tool interface (7 methods), 4-layer permissions, auto-compact, streaming executor, error recovery, file state cache, tool assembly pipeline.

Tier 3: Feature-Complete (8-16 weeks)

Sub-agent system (sync/async), MCP integration, 5-layer compaction, session memory, streaming API, transcript recording.

Tier 4: Full Parity (16-24 weeks)

Coordinator mode, context collapse, reactive compact, all 40+ tools, 85+ slash commands, IDE bridge, agent memory.

10. Enterprise & Advanced UX

Additional components critical for production deployment and advanced user experience.

Cost Tracking

Cost trackers enforce budgets and intercept execution when token limits are reached.

Upstream Proxy

Network layer handles corporate proxies for enterprise deployments.

Advanced CLI UX

Includes custom keybindings and full Vim mode for developer productivity.

Remote Execution

Secure tool execution over SSH boundaries.

Voice Input

Experimental architecture for speech-to-text input.

Domain Playbook

How to adapt Claude Code's architecture to any domain.

Portability Matrix

What changes per domain vs what stays the same.

Component	Code Domain	Data Domain	API Domain	Creative Domain
Agent Loop	Direct Port	Direct Port	Direct Port	Direct Port
BashTool	Keep	→ DB Queries	→ API Calls	→ Image Gen
FileReadTool	Keep	CSV/JSON	→ API Fetch	→ Asset Load
FileEditTool	Keep	Transforms	→ API Mutate	→ Asset Edit
GrepTool	Keep	→ Data Search	→ API Search	→ Content Search
Permissions	Direct Port	Direct Port	Direct Port	Direct Port
Compaction	Direct Port	Direct Port	Direct Port	Direct Port
Sub-agents	Direct Port	Direct Port	Direct Port	Direct Port

Code Domain

Software development, DevOps, infrastructure

Bash, FileRead, FileEdit, Grep tools
Git integration
LSP support
Test execution

Data Domain

Analytics, data engineering, BI

SQL query tool (replace Bash)
CSV/JSON read/transform
Database search (replace Grep)
Visualization generation

API Domain

Integration, automation, workflows

HTTP request tool (replace Bash)
API schema discovery
Webhook management
Workflow orchestration

Creative Domain

Design, content, media production

Image generation tool (replace Bash)
Asset management
Content search
Style transfer

Domain Adaptation Decisions

Key decisions for adapting to your domain.

Tool Selection
Choose 3-5 core tools that represent the primary actions in your domain. Map each Claude Code tool to a domain equivalent.
Permission Model
Define what actions require approval. Read-only vs write operations. External API calls. Data export.
Context Strategy
Determine what constitutes "context" in your domain. For code: files. For data: schemas, samples. For API: endpoints, schemas.
Success Criteria
Define what "done" looks like. For code: tests pass. For data: query returns results. For API: request succeeds. For creative: asset generated.
Cost Tracking
Track token costs and enforce budgets per session across all domains.

Critical Design Decisions to Preserve

These patterns are domain-agnostic. Don't change them.

AsyncGenerator Pattern
The loop yields events lazily. Callers consume lazily. Interruption via .return().
Fail-Closed Defaults
isConcurrencySafe() = false, isReadOnly() = false unless explicitly set.
Withholding Errors
413 and max_output_tokens are withheld during recovery. Premature error surfacing kills SDK consumers.
Tool Result Budget
Prevents any single tool result from exceeding per-message limits. Must run before compaction.
Prompt Cache Stability
Alphabetical tool sorting with built-ins as contiguous prefix. Required for server-side cache.
Reactive Compact Spiral Prevention
hasAttemptedReactiveCompact flag. Without it, infinite 413 loop.
Compaction Self-PTL
Compaction can itself exceed context. Must handle with truncation retry.
Task Budget Across Compacts
taskBudgetRemaining carries over pre-compact context size. Without it, server's countdown under-counts.

Decision Framework

When to use which pattern.

How long are your typical conversations?

< 10 turns

Simple context management. Track token count.

10-50 turns

Basic compaction. Summarize oldest messages.

50+ turns

Full context management: auto-compact, micro-compaction, selective injection.

How sensitive are the agent's actions?

Low (read-only)

Basic permissions: whitelist paths.

Medium (file edits)

Layered: static rules + user approval.

High (code execution)

Full system: deny + classifier + hooks + prompts + sandboxing.

How fast do responses need to feel?

Not critical

Simple request/response. Wait for full response.

Important

Stream responses. Show tokens as they arrive.

Critical

Full streaming execution. Start tools while model generates.

Do tasks span multiple sessions?

No

In-memory state only.

Sometimes

Basic persistence: save history, enable resume.

Always

Full persistence: transcripts, file cache, permission history.

Note

These aren't mutually exclusive. Start simple, add complexity based on real user needs. Claude Code evolved through months of production use. Your system will too.

Supplementary Guides

Detailed reference material and practical checklists.

Loading...

Building Agentic Systems

What Is an Agentic System?

Goal-Oriented

Tool-Using

Context-Aware

Iterative

The Agent Loop — Heart of Every Agentic System

Core Concepts

1. The Query Loop (Agent Loop)

Why it matters

2. Tool System

Why it matters

3. Context Management

Why it matters

4. Permission System

Why it matters

5. Streaming Execution

Why it matters

6. Session Persistence

Why it matters

Architecture Patterns

Pattern 1: Layered Architecture

Pattern 2: Streaming Pipeline

Pattern 3: Context Window Management

Strategy 1: Summarization

Strategy 2: Selective Injection

Strategy 3: Micro-Compaction

Pattern 4: Multi-Layer Permissions

Pattern 5: Error Recovery

Recovery 1: Retry with Backoff

Recovery 2: Model Fallback

Recovery 3: Context Compaction

Pattern 6: Composable Sub-Agents

Interactive Workflows

Design Principles

Build Your Own Agentic System

Define Purpose and Boundaries

Build the Core Query Loop

Implement Your First Tools

Add Basic Permissions

Implement Context Management

Add Streaming

Build Error Recovery

Add Session Persistence

Iterate on Tool Quality

Test Edge Cases Relentlessly

Technical Specification

1. Loop Exit Conditions (13 total)

blocking_limit

completed

aborted

max_turns

model_fallback

reactive_compact

max_output_recovery

2. Message Construction Pipeline

3. Error Recovery Cascade

Step 1

Step 2

Step 3

Step 1

Step 2

Step 3

Step 1

Step 2

4. Tool Interface (30+ methods)

Required

Recommended

Optional

5. Four-Layer Permission Architecture

default

plan

auto

acceptEdits

6. Five-Layer Compaction Pipeline

Layer 1: Snip

Layer 2: Micro

Layer 3: Auto

Layer 4: Reactive

Layer 5: Context Collapse