A complete reverse-engineered architectural blueprint of Claude Code. This guide provides the exact execution loops, compression pipelines, and security models required to rebuild and adapt production-grade agentic systems for any custom domain.
Created by Beijing Procen Medical AI Technology Co., Ltd.
An AI that pursues goals autonomously — perceiving, deciding, acting, and learning from results.
Works toward completing tasks, not just answering questions. Breaks complex goals into steps and executes them autonomously.
Interacts with external systems: executes code, reads/writes files, searches the web, calls APIs, and more.
Maintains conversation history, remembers past interactions, and adapts behavior based on accumulated context.
Works in loops: observes → thinks → acts → observes results → decides next. Continues until the goal is achieved.
This loop repeats until the task is complete or the agent decides to stop
Claude Code demonstrates these principles in production: multi-file codebases, long conversations, tool errors, and user collaboration. The patterns apply to any agentic system.
The six building blocks every agentic system needs.
A continuous loop: send messages to AI → receive response → execute tools → feed results back → repeat. Without a loop, you have a chatbot.
Claude Code's loop runs in query.ts with exit conditions: no tool use, max turns reached, budget exceeded, or user abort. Every agentic system needs a similar loop with clear termination conditions.
Defined actions the agent can take: read files, execute commands, search the web. Each tool has a name, description (for the AI), input schema, and execution logic.
Tools are how the agent interacts with the world. Without tools, it can only generate text. Well-described tools lead to better agent decisions.
Managing what the agent knows: conversation history, file contents, tool results. AI models have finite context windows.
Claude Code uses auto-compaction (summarizing old messages), micro-compaction (removing redundant results), and selective attachment injection to stay within limits.
Controls what the agent can and cannot do. Prevents dangerous actions, requires user approval for sensitive operations.
Autonomous agents can cause real damage. Without permissions, agents are unsafe for production. Multiple layers ensure defense in depth.
Starting tool execution while the AI is still generating its response. Tools begin as soon as their input is complete.
Reduces latency by 30-50%. Users see progress immediately. Critical for long-running tasks.
Saving conversation state so it can be resumed later. Includes message history, file states, and execution context.
Claude Code records transcripts to JSONL files and replays them. Persistence enables recovery; strategic discard enables scalability.
Proven patterns from Claude Code that apply to any agentic system.
Separate concerns into distinct layers. Each layer has a specific responsibility and communicates through well-defined interfaces.
Each layer evolves independently. Swap the UI without changing the runtime. Add tools without touching the API layer.
Process data as it arrives. The AI streams tokens, tools start as soon as input is complete.
Reduces perceived latency by 30-50%. Users see progress immediately.
AI models have finite context windows. Use compaction strategies to preserve important context.
Use a smaller model to summarize old conversation turns.
Inject only relevant memories, skills, or file contents based on the current task.
Remove redundant tool results, collapse similar search outputs.
Defense in depth for safety. Multiple layers ensure that even if one fails, others catch dangerous actions.
Never rely on a single permission layer. Multiple layers ensure defense in depth.
Agents will encounter errors. Build recovery paths, not just error messages.
API rate limits? Retry with exponential backoff.
Primary model overloaded? Switch to fallback automatically.
Prompt too long? Compact context and retry.
Spawn sub-agents for parallel work. Each runs its own loop with a filtered tool set.
Enables parallel work without duplicating the core loop. Sub-agents get restricted tool sets.
See how data flows through Claude Code's architecture. Click any workflow to visualize the path.
The why behind the architecture decisions.
Don't wait for perfect information. Start executing as soon as you have enough context.
Rate limited? Retry. Model overloaded? Switch. Context too long? Compact. Build resilience.
Streaming makes 10-second responses feel like 2 seconds. Parallel execution makes sequential tasks feel instant.
Tool system provides mechanisms. Permission system provides policy. Keep them separate.
Show what the agent is thinking, executing, and why. Users trust systems they can understand.
Handle edge cases: prompt too long, max output tokens, model fallback, permission edge cases.
Sub-agents, plugins, MCP servers. Build composable systems, not monoliths.
Record all conversations. But also compact old context. Persistence enables recovery; selective discard enables scalability.
Step-by-step implementation guide for any product team.
Before writing code: What tasks will this agent perform? What tools? What should it never do? Define yours first.
Start minimal: send message → get response → execute tools → feed results back → repeat. Claude Code's core loop is ~100 lines.
Start with 2-3 essential tools. Define: name, description, input schema, execution logic. Test in isolation.
Whitelist allowed commands, file paths the agent can access, and ask user fallback for anything uncertain.
Track context window usage. At 80% capacity, trigger compaction. Start simple: summarize oldest messages.
Stream AI responses token-by-token. Start tool execution before full response is complete. Biggest UX win.
For each error type, define a recovery strategy. Don't just show errors; recover from them.
Record conversations to disk. Save file states. Enable resume. Users should pick up where they left off.
The AI chooses tools based on descriptions. Make them clear and specific. Bad descriptions = bad decisions.
Test with: very long conversations, large operations, network failures, API timeouts, permission denials.
Steps 1-4: functional agent. Steps 5-7: production-ready. Steps 8-10: robust. Ship early, iterate.
Implementation contracts and detailed requirements from production audit.
The agent loop must handle these exit conditions. MVP requires 1, 3, 4, 6, 8, 10. Production requires all 13.
Token count at blocking limit
No tool_use blocks in response
User abort or signal abort
Turn count exceeded
Switch model, continue
413 error, compact and retry
Output limit hit, recover up to 3x
Messages go through 6 transformations before each API call. Order matters.
Tool result budget MUST run before compaction. Microcompact MUST run before auto-compact. Wrong order causes compaction self-PTL.
Each error type has a specific recovery path.
Context collapse drain → retry
Reactive compact (strip media) → retry once
Surface error → return prompt_too_long
Escalate to 64k (once)
Inject recovery message → retry (up to 3x)
Surface error
Exponential backoff (up to 3x)
Switch to fallback model
Minimum viable subset for rebuild (7 required methods).
name - Tool identity
inputSchema - Zod validation
call() - Execution logic
checkPermissions() - Permission gate
validateInput() - Pre-permission validation
prompt() - System prompt description
mapToolResultToToolResultBlockParam() - Result serialization
isEnabled() - Feature gating
isReadOnly() - Safety classification
isConcurrencySafe() - Streaming routing
isDestructive() - Safety warnings
preparePermissionMatcher() - Hook matching
backfillObservableInput() - Input normalization
toAutoClassifierInput() - Auto-mode classification
All render*() methods - UI only (stub for headless)
Without validateInput(), stale file edits cause data corruption. Without isConcurrencySafe(), streaming executor deadlocks.
Defense in depth. Each layer can deny independently.
Ask for sensitive, auto-allow safe
Read-only tools only
AI classifier decides
Auto-allow file edits
Each layer handles different scenarios. All needed for production.
Truncates old messages for SDK/long sessions. Replay removes zombie messages.
Truncates individual tool results. Collapses search/read patterns. Tracks cache_deleted_input_tokens.
Threshold: effectiveContextWindow - 13K tokens. Calls Haiku to summarize. Circuit breaker: max 3 failures.
Triggered AFTER 413 error. Strips oversized media first. ONE-shot only (prevents spiral).
Projects collapsed view over full history. Commits persist across turns. Experimental.
Compaction itself can exceed context. Must handle with truncateHeadForPTLRetry() (max 3 retries). Strips images before compact.
Tools start executing while model is still streaming. Reduces latency 30-50%.
Start immediately (parallel execution)
Queue for sequential execution
Generate synthetic error results for pending tools
Critical state variables that prevent infinite loops and track progress.
messages - Full conversation history
turnCount - Turn counter
autoCompactTracking - Compact state (turnId, consecutiveFailures)
hasAttemptedReactiveCompact - Prevents spiral (tried once)
maxOutputTokensRecoveryCount - Recovery attempts (limit: 3)
maxOutputTokensOverride - Escalated token cap (64k)
taskBudgetRemaining - Budget across compacts
pendingToolUseSummary - Haiku summary from previous turn
stopHookActive - Stop hook processing
transition - Why previous iteration continued
Without hasAttemptedReactiveCompact, a 413 → reactive compact → 413 → reactive compact loop becomes infinite.
Tiered approach from MVP to full parity.
Core loop with 6 exit conditions, 3 core tools, basic permissions, simple context, API client with retry, session persistence.
Full tool interface (7 methods), 4-layer permissions, auto-compact, streaming executor, error recovery, file state cache, tool assembly pipeline.
Sub-agent system (sync/async), MCP integration, 5-layer compaction, session memory, streaming API, transcript recording.
Coordinator mode, context collapse, reactive compact, all 40+ tools, 85+ slash commands, IDE bridge, agent memory.
Additional components critical for production deployment and advanced user experience.
Cost trackers enforce budgets and intercept execution when token limits are reached.
Network layer handles corporate proxies for enterprise deployments.
Includes custom keybindings and full Vim mode for developer productivity.
Secure tool execution over SSH boundaries.
Experimental architecture for speech-to-text input.
How to adapt Claude Code's architecture to any domain.
What changes per domain vs what stays the same.
| Component | Code Domain | Data Domain | API Domain | Creative Domain |
|---|---|---|---|---|
| Agent Loop | Direct Port | Direct Port | Direct Port | Direct Port |
| BashTool | Keep | → DB Queries | → API Calls | → Image Gen |
| FileReadTool | Keep | CSV/JSON | → API Fetch | → Asset Load |
| FileEditTool | Keep | Transforms | → API Mutate | → Asset Edit |
| GrepTool | Keep | → Data Search | → API Search | → Content Search |
| Permissions | Direct Port | Direct Port | Direct Port | Direct Port |
| Compaction | Direct Port | Direct Port | Direct Port | Direct Port |
| Sub-agents | Direct Port | Direct Port | Direct Port | Direct Port |
Software development, DevOps, infrastructure
Analytics, data engineering, BI
Integration, automation, workflows
Design, content, media production
Key decisions for adapting to your domain.
Choose 3-5 core tools that represent the primary actions in your domain. Map each Claude Code tool to a domain equivalent.
Define what actions require approval. Read-only vs write operations. External API calls. Data export.
Determine what constitutes "context" in your domain. For code: files. For data: schemas, samples. For API: endpoints, schemas.
Define what "done" looks like. For code: tests pass. For data: query returns results. For API: request succeeds. For creative: asset generated.
Track token costs and enforce budgets per session across all domains.
These patterns are domain-agnostic. Don't change them.
The loop yields events lazily. Callers consume lazily. Interruption via .return().
isConcurrencySafe() = false, isReadOnly() = false unless explicitly set.
413 and max_output_tokens are withheld during recovery. Premature error surfacing kills SDK consumers.
Prevents any single tool result from exceeding per-message limits. Must run before compaction.
Alphabetical tool sorting with built-ins as contiguous prefix. Required for server-side cache.
hasAttemptedReactiveCompact flag. Without it, infinite 413 loop.
Compaction can itself exceed context. Must handle with truncation retry.
taskBudgetRemaining carries over pre-compact context size. Without it, server's countdown under-counts.
When to use which pattern.
Simple context management. Track token count.
Basic compaction. Summarize oldest messages.
Full context management: auto-compact, micro-compaction, selective injection.
Basic permissions: whitelist paths.
Layered: static rules + user approval.
Full system: deny + classifier + hooks + prompts + sandboxing.
Simple request/response. Wait for full response.
Stream responses. Show tokens as they arrive.
Full streaming execution. Start tools while model generates.
In-memory state only.
Basic persistence: save history, enable resume.
Full persistence: transcripts, file cache, permission history.
These aren't mutually exclusive. Start simple, add complexity based on real user needs. Claude Code evolved through months of production use. Your system will too.
Detailed reference material and practical checklists.
Loading...