Building Agentic Systems

A complete reverse-engineered architectural blueprint of Claude Code. This guide provides the exact execution loops, compression pipelines, and security models required to rebuild and adapt production-grade agentic systems for any custom domain.

Created by Beijing Procen Medical AI Technology Co., Ltd.

What Is an Agentic System?

An AI that pursues goals autonomously — perceiving, deciding, acting, and learning from results.

Goal-Oriented

Works toward completing tasks, not just answering questions. Breaks complex goals into steps and executes them autonomously.

Tool-Using

Interacts with external systems: executes code, reads/writes files, searches the web, calls APIs, and more.

Context-Aware

Maintains conversation history, remembers past interactions, and adapts behavior based on accumulated context.

Iterative

Works in loops: observes → thinks → acts → observes results → decides next. Continues until the goal is achieved.

The Agent Loop — Heart of Every Agentic System

Perceive
Think
Act
Observe
Decide Next

This loop repeats until the task is complete or the agent decides to stop

Why Claude Code?

Claude Code demonstrates these principles in production: multi-file codebases, long conversations, tool errors, and user collaboration. The patterns apply to any agentic system.

Core Concepts

The six building blocks every agentic system needs.

1. The Query Loop (Agent Loop)

A continuous loop: send messages to AI → receive response → execute tools → feed results back → repeat. Without a loop, you have a chatbot.

Why it matters

Claude Code's loop runs in query.ts with exit conditions: no tool use, max turns reached, budget exceeded, or user abort. Every agentic system needs a similar loop with clear termination conditions.

2. Tool System

Defined actions the agent can take: read files, execute commands, search the web. Each tool has a name, description (for the AI), input schema, and execution logic.

Why it matters

Tools are how the agent interacts with the world. Without tools, it can only generate text. Well-described tools lead to better agent decisions.

3. Context Management

Managing what the agent knows: conversation history, file contents, tool results. AI models have finite context windows.

Why it matters

Claude Code uses auto-compaction (summarizing old messages), micro-compaction (removing redundant results), and selective attachment injection to stay within limits.

4. Permission System

Controls what the agent can and cannot do. Prevents dangerous actions, requires user approval for sensitive operations.

Why it matters

Autonomous agents can cause real damage. Without permissions, agents are unsafe for production. Multiple layers ensure defense in depth.

5. Streaming Execution

Starting tool execution while the AI is still generating its response. Tools begin as soon as their input is complete.

Why it matters

Reduces latency by 30-50%. Users see progress immediately. Critical for long-running tasks.

6. Session Persistence

Saving conversation state so it can be resumed later. Includes message history, file states, and execution context.

Why it matters

Claude Code records transcripts to JSONL files and replays them. Persistence enables recovery; strategic discard enables scalability.

Architecture Patterns

Proven patterns from Claude Code that apply to any agentic system.

Pattern 1: Layered Architecture

Separate concerns into distinct layers. Each layer has a specific responsibility and communicates through well-defined interfaces.

UI Layer (React/Ink)
Runtime (Query Loop, Streaming)
API Layer (SDK, Retry, Fallback)
Tool System (Bash, File, Web, Agent...)
Infrastructure (Permissions, Context, Session)
Why this works

Each layer evolves independently. Swap the UI without changing the runtime. Add tools without touching the API layer.

Pattern 2: Streaming Pipeline

Process data as it arrives. The AI streams tokens, tools start as soon as input is complete.

API Stream
Parse Tool Use
Start Tool
Yield Result
Why this works

Reduces perceived latency by 30-50%. Users see progress immediately.

Pattern 3: Context Window Management

AI models have finite context windows. Use compaction strategies to preserve important context.

Strategy 1: Summarization

Use a smaller model to summarize old conversation turns.

Strategy 2: Selective Injection

Inject only relevant memories, skills, or file contents based on the current task.

Strategy 3: Micro-Compaction

Remove redundant tool results, collapse similar search outputs.

Pattern 4: Multi-Layer Permissions

Defense in depth for safety. Multiple layers ensure that even if one fails, others catch dangerous actions.

Layer 1: Static Deny Rules
Layer 2: Auto-mode Classifier
Layer 3: Hook-based Rules
Layer 4: User Prompt
Warning

Never rely on a single permission layer. Multiple layers ensure defense in depth.

Pattern 5: Error Recovery

Agents will encounter errors. Build recovery paths, not just error messages.

Recovery 1: Retry with Backoff

API rate limits? Retry with exponential backoff.

Recovery 2: Model Fallback

Primary model overloaded? Switch to fallback automatically.

Recovery 3: Context Compaction

Prompt too long? Compact context and retry.

Pattern 6: Composable Sub-Agents

Spawn sub-agents for parallel work. Each runs its own loop with a filtered tool set.

Parent Agent
Spawn Sub-Agent
Filtered Tools
Parallel Execution
Results Back
Why this works

Enables parallel work without duplicating the core loop. Sub-agents get restricted tool sets.

Interactive Workflows

See how data flows through Claude Code's architecture. Click any workflow to visualize the path.

Design Principles

The why behind the architecture decisions.

Build Your Own Agentic System

Step-by-step implementation guide for any product team.

  1. Define Purpose and Boundaries

    Before writing code: What tasks will this agent perform? What tools? What should it never do? Define yours first.

  2. Build the Core Query Loop

    Start minimal: send message → get response → execute tools → feed results back → repeat. Claude Code's core loop is ~100 lines.

  3. Implement Your First Tools

    Start with 2-3 essential tools. Define: name, description, input schema, execution logic. Test in isolation.

  4. Add Basic Permissions

    Whitelist allowed commands, file paths the agent can access, and ask user fallback for anything uncertain.

  5. Implement Context Management

    Track context window usage. At 80% capacity, trigger compaction. Start simple: summarize oldest messages.

  6. Add Streaming

    Stream AI responses token-by-token. Start tool execution before full response is complete. Biggest UX win.

  7. Build Error Recovery

    For each error type, define a recovery strategy. Don't just show errors; recover from them.

  8. Add Session Persistence

    Record conversations to disk. Save file states. Enable resume. Users should pick up where they left off.

  9. Iterate on Tool Quality

    The AI chooses tools based on descriptions. Make them clear and specific. Bad descriptions = bad decisions.

  10. Test Edge Cases Relentlessly

    Test with: very long conversations, large operations, network failures, API timeouts, permission denials.

Start with an MVP

Steps 1-4: functional agent. Steps 5-7: production-ready. Steps 8-10: robust. Ship early, iterate.

Technical Specification

Implementation contracts and detailed requirements from production audit.

1. Loop Exit Conditions (13 total)

The agent loop must handle these exit conditions. MVP requires 1, 3, 4, 6, 8, 10. Production requires all 13.

Terminal Exits (no recovery)
blocking_limit

Token count at blocking limit

completed

No tool_use blocks in response

aborted

User abort or signal abort

max_turns

Turn count exceeded

Recoverable Exits (continue loop)
model_fallback

Switch model, continue

reactive_compact

413 error, compact and retry

max_output_recovery

Output limit hit, recover up to 3x

2. Message Construction Pipeline

Messages go through 6 transformations before each API call. Order matters.

1. Strip before compact boundary
2. Apply tool result budget
3. Snip compact (old messages)
4. Microcompact (inline compression)
5. Context collapse projection
6. Auto-compact (summarization)
Critical

Tool result budget MUST run before compaction. Microcompact MUST run before auto-compact. Wrong order causes compaction self-PTL.

3. Error Recovery Cascade

Each error type has a specific recovery path.

413 (Prompt Too Long)
Step 1

Context collapse drain → retry

Step 2

Reactive compact (strip media) → retry once

Step 3

Surface error → return prompt_too_long

Max Output Tokens
Step 1

Escalate to 64k (once)

Step 2

Inject recovery message → retry (up to 3x)

Step 3

Surface error

529 (Overloaded)
Step 1

Exponential backoff (up to 3x)

Step 2

Switch to fallback model

4. Tool Interface (30+ methods)

Minimum viable subset for rebuild (7 required methods).

Required

name - Tool identity
inputSchema - Zod validation
call() - Execution logic
checkPermissions() - Permission gate
validateInput() - Pre-permission validation
prompt() - System prompt description
mapToolResultToToolResultBlockParam() - Result serialization

Recommended

isEnabled() - Feature gating
isReadOnly() - Safety classification
isConcurrencySafe() - Streaming routing
isDestructive() - Safety warnings
preparePermissionMatcher() - Hook matching

Optional

backfillObservableInput() - Input normalization
toAutoClassifierInput() - Auto-mode classification
All render*() methods - UI only (stub for headless)

Critical

Without validateInput(), stale file edits cause data corruption. Without isConcurrencySafe(), streaming executor deadlocks.

5. Four-Layer Permission Architecture

Defense in depth. Each layer can deny independently.

Layer 1: Static Deny Rules (tool assembly time)
Layer 2: Tool-Specific checkPermissions() (allow | ask | deny | passthrough)
Layer 3: Hook-Based Rules (if/then pattern matching)
Layer 4: Interactive Prompt / Auto-Classifier / Auto-Deny
Permission Modes
default

Ask for sensitive, auto-allow safe

plan

Read-only tools only

auto

AI classifier decides

acceptEdits

Auto-allow file edits

6. Five-Layer Compaction Pipeline

Each layer handles different scenarios. All needed for production.

Layer 1: Snip

Truncates old messages for SDK/long sessions. Replay removes zombie messages.

Layer 2: Micro

Truncates individual tool results. Collapses search/read patterns. Tracks cache_deleted_input_tokens.

Layer 3: Auto

Threshold: effectiveContextWindow - 13K tokens. Calls Haiku to summarize. Circuit breaker: max 3 failures.

Layer 4: Reactive

Triggered AFTER 413 error. Strips oversized media first. ONE-shot only (prevents spiral).

Layer 5: Context Collapse

Projects collapsed view over full history. Commits persist across turns. Experimental.

Critical: Compaction Self-PTL

Compaction itself can exceed context. Must handle with truncateHeadForPTLRetry() (max 3 retries). Strips images before compact.

7. Streaming Tool Execution Model

Tools start executing while model is still streaming. Reduces latency 30-50%.

Execution Logic
concurrencySafe = true

Start immediately (parallel execution)

concurrencySafe = false

Queue for sequential execution

Abort handling

Generate synthetic error results for pending tools

8. State Tracking (Per-Query)

Critical state variables that prevent infinite loops and track progress.

Essential

messages - Full conversation history
turnCount - Turn counter
autoCompactTracking - Compact state (turnId, consecutiveFailures)
hasAttemptedReactiveCompact - Prevents spiral (tried once)

Recovery

maxOutputTokensRecoveryCount - Recovery attempts (limit: 3)
maxOutputTokensOverride - Escalated token cap (64k)
taskBudgetRemaining - Budget across compacts

Advanced

pendingToolUseSummary - Haiku summary from previous turn
stopHookActive - Stop hook processing
transition - Why previous iteration continued

Critical

Without hasAttemptedReactiveCompact, a 413 → reactive compact → 413 → reactive compact loop becomes infinite.

9. Production Roadmap

Tiered approach from MVP to full parity.

  • Tier 1: MVP (2-4 weeks)

    Core loop with 6 exit conditions, 3 core tools, basic permissions, simple context, API client with retry, session persistence.

  • Tier 2: Production-Ready (4-8 weeks)

    Full tool interface (7 methods), 4-layer permissions, auto-compact, streaming executor, error recovery, file state cache, tool assembly pipeline.

  • Tier 3: Feature-Complete (8-16 weeks)

    Sub-agent system (sync/async), MCP integration, 5-layer compaction, session memory, streaming API, transcript recording.

  • Tier 4: Full Parity (16-24 weeks)

    Coordinator mode, context collapse, reactive compact, all 40+ tools, 85+ slash commands, IDE bridge, agent memory.

  • 10. Enterprise & Advanced UX

    Additional components critical for production deployment and advanced user experience.

    Cost Tracking

    Cost trackers enforce budgets and intercept execution when token limits are reached.

    Upstream Proxy

    Network layer handles corporate proxies for enterprise deployments.

    Advanced CLI UX

    Includes custom keybindings and full Vim mode for developer productivity.

    Remote Execution

    Secure tool execution over SSH boundaries.

    Voice Input

    Experimental architecture for speech-to-text input.

    Domain Playbook

    How to adapt Claude Code's architecture to any domain.

    Portability Matrix

    What changes per domain vs what stays the same.

    Component Code Domain Data Domain API Domain Creative Domain
    Agent Loop Direct Port Direct Port Direct Port Direct Port
    BashTool Keep → DB Queries → API Calls → Image Gen
    FileReadTool Keep CSV/JSON → API Fetch → Asset Load
    FileEditTool Keep Transforms → API Mutate → Asset Edit
    GrepTool Keep → Data Search → API Search → Content Search
    Permissions Direct Port Direct Port Direct Port Direct Port
    Compaction Direct Port Direct Port Direct Port Direct Port
    Sub-agents Direct Port Direct Port Direct Port Direct Port

    Code Domain

    Software development, DevOps, infrastructure

    • Bash, FileRead, FileEdit, Grep tools
    • Git integration
    • LSP support
    • Test execution

    Data Domain

    Analytics, data engineering, BI

    • SQL query tool (replace Bash)
    • CSV/JSON read/transform
    • Database search (replace Grep)
    • Visualization generation

    API Domain

    Integration, automation, workflows

    • HTTP request tool (replace Bash)
    • API schema discovery
    • Webhook management
    • Workflow orchestration

    Creative Domain

    Design, content, media production

    • Image generation tool (replace Bash)
    • Asset management
    • Content search
    • Style transfer

    Domain Adaptation Decisions

    Key decisions for adapting to your domain.

    1. Tool Selection

      Choose 3-5 core tools that represent the primary actions in your domain. Map each Claude Code tool to a domain equivalent.

    2. Permission Model

      Define what actions require approval. Read-only vs write operations. External API calls. Data export.

    3. Context Strategy

      Determine what constitutes "context" in your domain. For code: files. For data: schemas, samples. For API: endpoints, schemas.

    4. Success Criteria

      Define what "done" looks like. For code: tests pass. For data: query returns results. For API: request succeeds. For creative: asset generated.

    5. Cost Tracking

      Track token costs and enforce budgets per session across all domains.

    Critical Design Decisions to Preserve

    These patterns are domain-agnostic. Don't change them.

    1. AsyncGenerator Pattern

      The loop yields events lazily. Callers consume lazily. Interruption via .return().

    2. Fail-Closed Defaults

      isConcurrencySafe() = false, isReadOnly() = false unless explicitly set.

    3. Withholding Errors

      413 and max_output_tokens are withheld during recovery. Premature error surfacing kills SDK consumers.

    4. Tool Result Budget

      Prevents any single tool result from exceeding per-message limits. Must run before compaction.

    5. Prompt Cache Stability

      Alphabetical tool sorting with built-ins as contiguous prefix. Required for server-side cache.

    6. Reactive Compact Spiral Prevention

      hasAttemptedReactiveCompact flag. Without it, infinite 413 loop.

    7. Compaction Self-PTL

      Compaction can itself exceed context. Must handle with truncation retry.

    8. Task Budget Across Compacts

      taskBudgetRemaining carries over pre-compact context size. Without it, server's countdown under-counts.

    Decision Framework

    When to use which pattern.

    How long are your typical conversations?
    < 10 turns

    Simple context management. Track token count.

    10-50 turns

    Basic compaction. Summarize oldest messages.

    50+ turns

    Full context management: auto-compact, micro-compaction, selective injection.

    How sensitive are the agent's actions?
    Low (read-only)

    Basic permissions: whitelist paths.

    Medium (file edits)

    Layered: static rules + user approval.

    High (code execution)

    Full system: deny + classifier + hooks + prompts + sandboxing.

    How fast do responses need to feel?
    Not critical

    Simple request/response. Wait for full response.

    Important

    Stream responses. Show tokens as they arrive.

    Critical

    Full streaming execution. Start tools while model generates.

    Do tasks span multiple sessions?
    No

    In-memory state only.

    Sometimes

    Basic persistence: save history, enable resume.

    Always

    Full persistence: transcripts, file cache, permission history.

    Note

    These aren't mutually exclusive. Start simple, add complexity based on real user needs. Claude Code evolved through months of production use. Your system will too.

    Supplementary Guides

    Detailed reference material and practical checklists.

    Loading...

    On this page