The Technical Dissection of Anthropic's Most Consequential Engineering Achievement
The Claude Agent SDK is not a wrapper around an API. It is a complete agentic runtime that ships with its own subprocess model, tool execution engine, session persistence layer, permission system, hook architecture, multi-agent coordination protocol, and memory stack. When developers and product teams talk about building AI agents in 2026, the SDK is increasingly the reference implementation they benchmark everything else against. Understanding why requires going significantly deeper than the marketing copy. It requires cracking the thing open.
This guide does exactly that. It traces every tool, every parameter schema, every architectural decision from the subprocess model up through the multi-agent coordination layer. It explains not just what each component does but why it exists at that particular layer of the stack, what problem it solves that the layer below it could not, and what it costs you when you bypass it. This is the guide that the official documentation does not provide because documentation tells you how to use things, and this guide is about understanding what you are actually operating.
Contents
- What the Claude Agent SDK Actually Is
- The Agent Loop Architecture
- The Complete Tool Arsenal: All 40 Tools Dissected
- Subagent Architecture and Delegation Patterns
- The Hook System: Intercepting Every Action
- CLAUDE.md and the Layered Memory Stack
- Permission Modes: Six Levels of Autonomy
- MCP Integration: Extending the Tool Surface
- Agent Teams: Coordinated Multi-Agent Systems
- Worktrees and Isolation: Safe Parallel Execution
- Scheduling, Cron, and Remote Triggers
- The SDK API: Programmatic Agent Construction
- Why the Architecture Wins
1. What the Claude Agent SDK Actually Is
The structural question to start with is not "what tools does this SDK expose?" It is: "what economic and engineering problem does a native agentic runtime solve that calling the API directly does not?" The answer reveals why the SDK is architecturally different from every competing framework.
When you call the Claude API directly, you are managing a stateless round trip. You send a prompt, receive a response, and your application code is responsible for parsing tool calls, executing them, feeding results back, tracking conversation history, handling rate limits, and deciding when the loop terminates. This is not a trivial amount of engineering. Production teams building agents on raw APIs spend significant time on the infrastructure layer before they build a single business-specific behavior. LangChain, CrewAI, and AutoGen all exist primarily to solve this infrastructure problem. The Claude Agent SDK takes a different approach: it ships the infrastructure as a subprocess.
The SDK does not run in your process. It spawns the claude binary (bundled in the npm package, or installed via the Claude Code CLI) as a separate operating system process. Your application communicates with that subprocess via stdin/stdout using a JSON message stream. This architecture has profound implications that are not obvious from the surface-level documentation. The subprocess owns the agentic loop. It calls tools, receives results, decides whether to loop again, manages context window budgeting, handles compaction when approaching token limits, and emits structured messages to your application as events. Your application code becomes an observer and interceptor of a loop it does not control, rather than an executor of a loop it must implement.
This design choice explains why the SDK's hook system is so central to its architecture. When you outsource the loop to a subprocess, you need a rich interception API to inject behavior at specific points. The SDK provides 30+ named hook events covering every meaningful moment in the agent's execution lifecycle. When you understand that the hook system is a compensation for handing control to a subprocess, you understand why it exists at the granularity it does. It is not feature-bloat. It is the necessary surface area for production customization when the core loop is not yours to modify.
The SDK is available in both Python (claude_agent_sdk) and TypeScript (@anthropic-ai/claude-agent-sdk). Sessions are persisted to disk as JSONL transcript files at ~/.claude/projects/<project-derived-path>/, meaning conversations survive process restarts and can be resumed by session ID. The SDK's query() function (Python) or query() return value (TypeScript) is the primary entry point, and it returns an async iterator of structured message objects that your application consumes.
Our building AI agents guide covered the competitive landscape of agent frameworks in depth. The Claude Agent SDK occupies a unique position in that landscape: it is not a framework that wraps an LLM API, it is a runtime that ships the LLM's preferred execution environment as a first-party artifact.
2. The Agent Loop Architecture
The agent loop is the heartbeat of everything. Understanding it precisely, not abstractly, is the prerequisite for every other technical decision you make when deploying the SDK.
A turn is one round-trip: Claude receives a message, evaluates it, calls zero or more tools, receives tool results, potentially calls more tools, and eventually produces a text response with no further tool calls. One call to query() can contain many turns. The SDK executes turns internally, without yielding control to your application code between tool calls within a turn. This is the critical architectural fact that catches most developers when they first integrate the SDK. If you need to inject behavior between individual tool calls within a turn, you do it through hooks, not through application-level code between await calls.
The loop itself follows a simple but powerful pattern that the official diagram captures cleanly: gather context, take action, verify work, then repeat until the task is complete or a termination condition triggers. The "gather context" step is where CLAUDE.md files are consulted, memory is loaded, and MCP tool schemas are enumerated. The "take action" step is where tools are invoked, files are read and written, web searches are run, and subagents are spawned. The "verify work" step is where Claude evaluates whether the action succeeded, what the output means, and whether additional actions are needed. The loop terminates when Claude produces a response with no tool calls, when max_turns is reached, when the budget ceiling (max_budget_usd) is hit, or when the context window forces compaction.
Context compaction is a detail that many SDK integrations get wrong. When the agent approaches the context limit, the SDK automatically compacts the conversation: it summarizes prior turns into a compressed representation and re-injects the CLAUDE.md files from scratch. This emits a SystemMessage with subtype: "compact_boundary". Applications that parse message history without handling compact boundaries will see inconsistent turn counts. The PreCompact hook fires before this happens, giving applications the opportunity to inject additional context or take a snapshot before the compaction occurs.
The ResultMessage object that terminates every query() call carries the full accounting: total_cost_usd, a usage dict with input tokens, output tokens, and cache hits broken down per model, the session_id for resumption, and num_turns for the completed run. The subtype field on this message tells you exactly why the loop stopped, and the enumeration of possible subtypes tells you what edge cases to handle:
success: clean terminationerror_max_turns: hit the turn ceilingerror_max_budget_usd: hit the cost ceilingerror_during_execution: tool or API error stopped executionerror_max_structured_output_retries: structured output schema validation failed
These five subtypes are not a detail. They are the contract that lets you build reliable retry logic, fallback routing, and cost alerting. An application that only handles success is an application that silently drops information about why its agents stopped working.
3. The Complete Tool Arsenal: All 40 Tools Dissected
The tools are the action surface of the agent. Every capability the agent exercises in the world flows through a tool call. Understanding each tool precisely, including its exact parameter schema and the class of action it enables, is not optional for production deployments. It is the foundation of your permission policy, your hook logic, and your cost modeling.
The SDK ships 40 tools as of the current release. They fall into several natural categories that reveal the architecture's intentions: filesystem operations, process execution, search, agent management, scheduling, external services, and session control. Walking through them in these groupings is more useful than an alphabetical list.
Filesystem Tools
The filesystem tools are the workhorses. Everything the agent does that persists beyond the conversation goes through these.
Read is the primary file reading tool. Its parameter schema is {"file_path": string, "offset": number, "limit": number, "pages": string}. The file_path must be absolute. The tool supports images (returned as visual content that Claude processes multimodally), PDFs (up to 20 pages per call via the pages parameter like "1-5"), and Jupyter notebooks (which are parsed and returned with cell outputs combined). The output format is cat-with-line-numbers, which is why the offset and limit parameters exist: large files can be read in chunks by specifying starting line and line count. Read does not require permission by default. It is the most commonly called tool in most agentic workflows.
Write creates or overwrites files. Schema: {"file_path": string, "content": string}. This is a destructive operation: it replaces the entire file content. The SDK requires that the agent have previously Read a file before Editing or Writing it, enforced at the tool level. Write requires permission unless the session is running in acceptEdits or higher permission mode. Every Write call is subject to path-based protection rules that block writes to .git, .vscode, .claude, and several other configuration directories by default.
Edit performs exact string replacement within a file. Schema: {"file_path": string, "old_string": string, "new_string": string, "replace_all": boolean}. The old_string must appear exactly once in the file (when replace_all is false), and the SDK validates this before executing. This constraint exists for a good reason: non-unique match strings produce ambiguous edits. The replace_all flag bypasses the uniqueness constraint and replaces every occurrence. Edit is the preferred tool for targeted changes to large files because it sends only the diff, not the full file content, which matters for context window efficiency.
Glob finds files by pattern. Schema: {"pattern": string, "path": string}. It uses standard glob syntax (**/*.ts, src/**/*.{py,js}), returns results sorted by modification time, and caps at 100 files. A notable behavior: Glob does NOT respect .gitignore by default. The CLAUDE_CODE_GLOB_NO_IGNORE=false environment variable enables gitignore-aware globbing. Results are sorted by modification time, which makes Glob useful for finding recently changed files as well as files matching a pattern.
NotebookEdit modifies Jupyter notebooks. Schema: {"file_path": string, "cell_id": string, "new_source": string, "edit_mode": string, "cell_type": string}. The edit_mode field accepts replace (overwrite cell source), insert (add a new cell after the target), and delete (remove the cell). This tool operates on the notebook's internal cell ID system, not line numbers, which makes it structurally aware of notebook semantics in a way that Edit is not.
Process Execution Tools
These tools give the agent access to the underlying operating system and shell environment.
Bash is the most powerful and most permission-sensitive tool in the suite. Schema: {"command": string, "description": string, "timeout": number, "run_in_background": boolean}. Default timeout is 2 minutes (120,000ms), maximum is 10 minutes (600,000ms). Output is capped at 30,000 characters by default, configurable via BASH_MAX_OUTPUT_LENGTH up to 150,000 characters. The description field is shown to users in permission prompts and logs, which is why it should be accurate and human-readable. The run_in_background parameter runs the command as a background process and returns immediately, useful for starting servers or long-running processes. Permission rules for Bash use command pattern matching: Bash(npm run *) pre-approves all npm run commands. Bash(git *) pre-approves all git operations. This granularity lets administrators lock down production environments to exactly the operations their workflows require.
Monitor is Bash's partner for long-running processes. It runs a background command and feeds each line of output back to Claude as it appears, letting the agent respond to log output in real time. This is architecturally distinct from Bash with run_in_background: Monitor keeps the output channel open, while background Bash returns immediately without tailing.
PowerShell is the Windows-native equivalent of Bash. Enabled via CLAUDE_CODE_USE_POWERSHELL_TOOL=1. It accepts the same {"command": string, "description": string, "timeout": number, "run_in_background": boolean} schema and is subject to the same permission pattern matching as Bash. Its existence reflects the SDK's commitment to genuine cross-platform support.
Search and Discovery Tools
Grep searches file contents using ripgrep under the hood. Schema: {"pattern": string, "path": string, "glob": string, "output_mode": string, "-i": boolean, "multiline": boolean, "-A": number, "-B": number, "-C": number, "head_limit": number, "offset": number}. The output_mode field has three values: files_with_matches (the default, returns only file paths), content (returns matching lines with context), and count (returns match counts per file). Unlike Glob, Grep respects .gitignore by default. The multiline flag enables cross-line pattern matching using ripgrep's multiline mode with dotall semantics. The head_limit and offset parameters enable pagination of large result sets.
WebSearch executes web searches. Schema: {"query": string, "num_results": number}. Each call can trigger up to 8 backend searches, and results are processed and summarized by a small auxiliary model before being fed to the main agent. This two-model architecture for web search is architecturally important: it prevents raw HTML from flooding the context window and applies a layer of relevance filtering. Permission for WebSearch applies at the tool level, not the query level, which means you either allow all web searches or none - search-topic restrictions must be enforced via hooks or system prompts.
Our guide to the best web search APIs for AI agents covers the competitive alternatives to the SDK's built-in WebSearch tool. The built-in implementation is convenient but not necessarily optimal for all production use cases.
WebFetch retrieves and processes a specific URL. Schema: {"url": string, "prompt": string}. Unlike WebSearch, it fetches a specific page and applies the prompt to extract or summarize relevant information. The prompt field guides a small auxiliary model's extraction pass, similar to WebSearch's summarization step. Permission for WebFetch can be scoped to specific domains: WebFetch(domain:example.com) pre-approves fetches from that domain while requiring approval for others.
ToolSearch discovers and loads MCP tools on demand. By default, MCP tool schemas are not preloaded into the context window (they would consume too many tokens). ToolSearch takes a natural language query, matches it against the available deferred MCP tools, and returns their complete parameter schemas. This lazy-loading architecture lets agents work with large collections of MCP tools without paying the context cost of loading all schemas upfront.
Agent Management Tools
Agent is the tool that spawns subagents. Schema: {"subagent_type": string, "description": string, "prompt": string, "run_in_background": boolean, "isolation": string}. The subagent_type field selects from registered agent definitions (built-in types like general-purpose, Explore, Plan or custom agents defined in .claude/agents/). The description field is displayed in the UI to explain what the subagent is doing. The isolation field accepts "worktree" to run the subagent in an isolated git worktree. When run_in_background is true, the tool returns immediately with an agent ID while the subagent runs concurrently.
Subagents cannot spawn further subagents: the Agent tool is not available within a subagent's tool suite. This is a deliberate constraint that prevents recursive spawning loops and keeps the execution tree to a maximum depth of two levels. The parent agent coordinates; the subagent executes. This pattern maps cleanly to the engineering principle that parallelism and coordination should live at the same level of abstraction, not nested infinitely.
SendMessage is available only when CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 is set. It sends messages between agent team teammates, enabling direct peer-to-peer communication rather than the parent-child delegation model that Agent implements. This tool is covered in detail in the agent teams section.
TodoWrite manages the session task checklist. Schema: {"todos": array}. It is the older task management tool, still default in -p (non-interactive print) mode and SDK mode. Each todo item has an id, content, status (pending, in_progress, or completed), and priority (low, medium, or high). This tool externalizes the agent's work state into an observable structure that hooks can inspect and that users can see in the UI.
The newer Task tools (TaskCreate, TaskGet, TaskList, TaskUpdate, TaskStop) replace TodoWrite with a richer model that supports dependencies between tasks and integrates with the agent teams shared task list. Enable them with CLAUDE_CODE_ENABLE_TASKS=1.
Scheduling Tools
CronCreate schedules recurring or one-shot tasks within the session. It accepts a standard 5-field cron expression (minute hour day-of-month month day-of-week), a prompt to execute at that time, and an is_recurring boolean. Sessions can have up to 50 scheduled tasks. Scheduled tasks are restored when a session is resumed, provided they have not expired (recurring tasks expire after 7 days). Jitter of up to 30 minutes is applied to recurring tasks at hour boundaries to prevent thundering-herd problems. The minimum interval is 1 minute. Disable the entire system with CLAUDE_CODE_DISABLE_CRON=1.
The cron expression format is exactly the standard 5-field Unix cron. */15 * * * * fires every 15 minutes. 0 9 * * 1-5 fires at 9am every weekday. 30 */2 * * * fires at the 30-minute mark of every even hour. This familiar format means developers do not need to learn a new scheduling syntax: any valid cron expression works, and any cron expression validator can verify the schedule before committing it. All times are interpreted in the local timezone, which is important for globally distributed teams: a cron expression that fires at 9am Pacific Standard Time will fire at a different UTC hour depending on whether daylight saving time is in effect.
CronList returns all scheduled tasks in the current session with their IDs, expressions, prompts, and next-fire times. CronDelete cancels a task by ID. Together these three tools form a complete in-process scheduling system that does not require any external infrastructure.
RemoteTrigger operates at a fundamentally different level from the Cron tools. It creates and manages Routines on Anthropic-managed cloud infrastructure at claude.ai/code/routines. Routines are not session-scoped: they persist independently of any running session, can be triggered by schedule (minimum 1 hour interval), by HTTP POST to a unique API endpoint, or by GitHub events. Each Routine run clones the repository fresh and executes in cloud infrastructure. This architecture is appropriate for production automation that must run even when no developer has Claude Code open, at the cost of requiring at minimum a Pro plan and Anthropic API access (Bedrock and Vertex are not supported).
Session Control Tools
EnterPlanMode switches the session to plan mode, where Claude can read files and explore the codebase but cannot modify source files. This mode is architecturally implemented using a read-only Plan subagent (Haiku-based, fast, low-latency) that handles codebase research to prevent the infinite nesting problem that would occur if the main session tried to spawn a subagent that could spawn further subagents.
ExitPlanMode presents the constructed plan for approval and exits plan mode. This tool requires permission, which is architecturally significant: it means the transition from planning to execution is an explicit confirmation step that the permission system enforces.
EnterWorktree creates an isolated git worktree and switches into it. The path parameter optionally specifies an existing worktree to enter rather than creating a new one. ExitWorktree returns to the original directory. Neither tool is available to subagents, only to the main session, because worktree lifecycle management must be centralized.
AskUserQuestion pauses the agent loop to ask the user a multiple-choice question. This is the structured alternative to free-form clarification requests, designed for cases where the agent needs to disambiguate between a small number of known alternatives before proceeding. It is the only mechanism in the SDK for structured human-in-the-loop decision points within a running turn.
External Services and Notifications
PushNotification sends a desktop or mobile push notification. This is useful for long-running background agents that need to surface completion events or errors to the user without requiring the user to watch a terminal.
ShareOnboardingGuide uploads a project's ONBOARDING.md to Anthropic's sharing infrastructure and returns a shareable link. This is a specialized tool for distributing project context to collaborators.
ListMcpResourcesTool and ReadMcpResourceTool expose MCP resource browsing. MCP resources are data assets (files, database snapshots, API responses) that MCP servers expose alongside tools. ListMcpResourcesTool enumerates available resources by URI, and ReadMcpResourceTool fetches a specific resource's content. This resource model is distinct from MCP tools and enables a content-as-well-as-function pattern for MCP server authors.
LSP provides code intelligence via language servers. This tool gives Claude access to structured code analysis (symbol definitions, references, type information, diagnostics) that Grep and Read cannot provide. It is the bridge between the agent's file-reading capabilities and the semantic understanding that language servers provide. This tool is particularly powerful for large refactoring tasks where understanding all references to a symbol across the codebase is a prerequisite for safe modification.
4. Subagent Architecture and Delegation Patterns
Subagents are where the SDK's design philosophy becomes most visible. The fundamental bet the architecture makes is that context isolation beats context sharing for agentic quality. Each subagent gets a fresh context window. It sees its task description, its tool list, and the files it reads during its own execution. It does not see the parent's conversation history, the parent's tool call chain, or any other subagent's work. This isolation is not a limitation: it is the mechanism that lets the parent agent dispatch multiple subagents in parallel without any of them interfering with each other's context.
Built-in agent types reflect deliberate capability segmentation. The Explore agent uses Haiku (Anthropic's fastest, most cost-effective model) and has all write and edit tools denied. It is built for high-frequency, low-latency codebase exploration tasks where writing anything would be a bug. The Plan agent also runs read-only but uses the main session's model, because planning quality should not be sacrificed for cost. The general-purpose agent inherits all tools and the main session's model, making it appropriate for complex multi-step operations where write access is needed.
Custom agents are defined in .claude/agents/<name>.md files with a YAML frontmatter block. The frontmatter fields are the agent's complete specification:
---
name: test-runner
description: Runs the test suite and reports failures. Use this when tests need to be validated after code changes.
tools:
- Bash
- Read
- Glob
model: haiku
permissionMode: acceptEdits
maxTurns: 20
effort: high
---
You are a test execution agent. When invoked, run the project's full test suite using the appropriate test command for this codebase. Report each failing test with its name, the assertion that failed, and the stack trace. Do not attempt to fix failures yourself.
The frontmatter's tools field is an allowlist: only the listed tools are available to this agent. The disallowedTools field, alternatively, is a denylist applied against all inherited tools. When both are specified, disallowedTools takes precedence. The model field accepts shorthand aliases (haiku, sonnet, opus) or full model IDs like claude-opus-4-7. The CLAUDE_CODE_SUBAGENT_MODEL environment variable overrides all per-agent model settings, which is useful for cost-capping development environments.
The description field is not documentation for humans. It is the text that Claude uses to decide whether to delegate to this agent. It is a tool description in the same sense that a function's docstring is a tool description for an LLM. It should be written as a precise, conditional statement: when X is true and Y is needed, use this agent. Vague descriptions produce incorrect delegation decisions.
The initialPrompt frontmatter field is an underappreciated capability that changes how custom agents are used. When set, the prompt is auto-submitted as the first user turn when the agent runs as a main session (invoked directly rather than as a subagent). This enables agent definitions to be self-contained executable workflows: an agent defined with an initialPrompt of "Audit the test suite for flaky tests and report your findings" can be invoked with claude --agent audit-tests and begins working immediately without requiring a separate prompt. This is the mechanism for distributing codified workflows as agent definitions that run on-demand without requiring the caller to know the exact prompt.
The effort frontmatter field for agents deserves explicit discussion because it interacts with cost in non-obvious ways. Higher effort settings increase the extended thinking budget, which improves reasoning quality on complex tasks at meaningful token cost. A general-purpose agent with effort: max investigating a complex bug might cost 10x more per invocation than the same agent with effort: low. The correct approach is to set effort based on task type at the agent definition level: a simple file summarization agent should use effort: low, while an architectural analysis agent should use effort: high. This effort segmentation is one of the most impactful cost optimization levers available in the SDK, one that most teams ignore entirely.
The model resolution order for subagents matters for teams managing costs across different agent types. The precedence is: CLAUDE_CODE_SUBAGENT_MODEL environment variable (highest, overrides everything), then the per-invocation model parameter in the Agent tool call, then the subagent definition's model frontmatter, then the main conversation's model (lowest, the fallback). This four-level override chain means you can set cost-capping defaults at the environment level for development environments without changing any agent definition files, and override them at the definition level for agents that genuinely need more capable models.
For AI framework comparison, our guide to CrewAI alternatives covers how the subagent model compares to CrewAI's crew/agent/task hierarchy and AutoGen's conversation model. The key structural difference: the Claude SDK's subagents are fully isolated subprocess instances with their own file system context, not just function calls within a shared process.
The isolation: "worktree" parameter in agent invocations creates a temporary git worktree for the subagent's execution. The worktree is based on origin/HEAD by default (configurable via the worktree.baseRef setting). If the subagent makes no commits, the worktree is automatically cleaned up when it exits. If it makes commits, the worktree branch and path are preserved for the parent to review and merge. This pattern is ideal for tasks like "apply this refactoring to a new branch for review" because the isolation is guaranteed at the filesystem level, not just the context level.
5. The Hook System: Intercepting Every Action
The hook system is where the SDK's production-readiness lives. A framework that cannot be instrumented, audited, or controlled at runtime is not a framework you can operate at scale. The Claude Agent SDK ships 30+ named hook events, five hook mechanism types, and a structured input/output contract that lets hooks make binding decisions about agent behavior.
The five hook mechanism types reflect different operational requirements. Command hooks execute a shell script, passing JSON to stdin and reading JSON from stdout. They are the most flexible type and run in the same shell environment as the agent. HTTP hooks POST to an endpoint you control, enabling centralized audit logging, policy enforcement, or integration with external approval workflows. MCP tool hooks call a specific function on an MCP server you have configured, bridging the hook system with the MCP ecosystem. Prompt hooks run a single-turn LLM evaluation, enabling AI-driven policy decisions: a prompt hook can ask Claude whether a proposed Bash command looks safe before executing it. Agent hooks spawn a full subagent to handle the hook event, the most powerful but most resource-intensive option.
The hook event taxonomy tells you exactly what the system considers meaningful moments:
The PreToolUse event fires before every tool execution. The hook receives the full tool input schema and can return a decision that blocks execution, modifies the input before execution, or appends additional context to the agent's next turn. The PostToolUse event fires after successful execution and receives the tool's output. The PostToolUseFailure event fires after a tool error. These three events let you build complete instrumentation of every action the agent takes, before and after.
UserPromptSubmit fires when a user submits a prompt, before the agent processes it. A blocking exit code from a hook here can prevent certain prompts from reaching the agent. This is the correct interception point for content filtering: filter at the input boundary, not after the agent has already seen and responded to a prompt. UserPromptExpansion fires after a slash command expands to its full prompt text, giving you the opportunity to inspect the expanded content before execution.
Stop fires when the agent finishes responding. A hook returning exit code 2 here tells the agent that its work is not complete and injects the hook's output as a system message, restarting the turn. This is the mechanism for post-completion validators: a hook that runs tests after every code change and tells the agent about failures it needs to fix is implemented as a Stop hook.
SubagentStart and SubagentStop fire around subagent execution. TeammateIdle fires when an agent team member is about to become idle. TaskCreated and TaskCompleted fire around the task lifecycle in the new task management system. WorktreeCreate and WorktreeRemove fire around worktree lifecycle events, enabling custom VCS integrations: the hook's stdout is interpreted as the worktree directory path, which means you can implement SVN, Perforce, or any other VCS as a first-class worktree provider.
The InstructionsLoaded event fires every time a CLAUDE.md file is loaded, including when subdirectory files are loaded during nested directory traversal. This gives you visibility into exactly which instructions are influencing the agent at any moment.
PreCompact and PostCompact bracket context window compaction events. The PreCompact hook can inject critical context that must survive the compaction boundary. PostCompact fires after the compaction is complete, at which point the context window is fresh and full.
The hook input schema provides operational context beyond the tool data:
{
"session_id": "abc123",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/current/working/directory",
"permission_mode": "default",
"hook_event_name": "PreToolUse",
"tool_name": "Bash",
"tool_input": {"command": "npm test", "description": "Run test suite"},
"tool_use_id": "unique-id",
"effort": {"level": "high"},
"agent_id": "subagent-id",
"agent_type": "custom-agent"
}
The transcript_path field is particularly powerful: it gives the hook access to the full conversation history as a JSONL file, enabling hooks to make decisions based on what the agent has already done earlier in the session.
Hook output uses a structured JSON format with continue, decision, reason, updatedInput, and additionalContext fields. The updatedInput field lets a PreToolUse hook modify the tool's input before execution: a hook can rewrite a Bash command to add safety flags, redirect file writes to a sandbox path, or inject additional context into a web search query. This is input mutation as policy enforcement, a pattern that is architecturally cleaner than trying to filter outputs after the fact.
Exit codes carry semantic meaning. Exit code 0 with JSON output means the hook succeeded and its JSON should be processed. Exit code 2 means a blocking error: the action should be denied and the hook's stdout is displayed to the user as the reason. Any other non-zero exit code is a non-blocking warning that is logged but does not stop execution. This three-tier exit code scheme lets hooks distinguish between "I have a policy objection" (code 2) and "I encountered an error but don't want to stop the agent" (other non-zero).
For MCP specifically, the mcp__<server>__.* pattern in hook matchers lets you apply hooks to all tools from a specific MCP server. This is important for production MCP deployments where you want per-server rate limiting, audit logging, or approval workflows without enumerating every tool name individually.
6. CLAUDE.md and the Layered Memory Stack
The context injection system is how the SDK solves the problem that every production AI application faces: how do you make an agent aware of project-specific conventions, architectural constraints, and operational rules without putting all of that information into every prompt? The answer is a layered, filesystem-hierarchical injection system that loads context at the right granularity for each scope.
CLAUDE.md files are loaded in order from broadest to most specific scope. At the broadest level, a managed policy file at /Library/Application Support/ClaudeCode/CLAUDE.md (macOS) applies to all users in an organization, useful for enterprise deployments that need to enforce compliance rules across every project. At the user level, ~/.claude/CLAUDE.md applies across all projects for a given user, where you would put preferences and conventions that apply regardless of which codebase you are working in. At the project level, ./CLAUDE.md or ./.claude/CLAUDE.md is committed to source control and shared with the team, the canonical location for project architecture, coding conventions, and operational rules. At the local level, ./CLAUDE.local.md is gitignored and private, for personal overrides that should not affect teammates.
The loading mechanism is recursive: Claude walks up the directory tree from the current working directory, loading all CLAUDE.md and CLAUDE.local.md files it finds. When Claude reads a file in a subdirectory, subdirectory-level CLAUDE.md files are loaded on demand. This lazy loading means you can have directory-specific instructions (a src/api/CLAUDE.md that explains the API layer's conventions) that only appear in context when Claude is actually working in that directory.
The @path/to/import syntax enables file inclusion within CLAUDE.md files, with a maximum depth of 5 hops. This lets you compose modular instruction files rather than maintaining one monolithic CLAUDE.md. An architecture document, a coding style guide, and an operational runbook can each live in their own file and be included by reference.
HTML block comments are stripped before injection. This means you can use <!-- --> to add documentation within CLAUDE.md that is visible when the file is read as documentation but invisible to the agent. This is useful for adding annotations, changelogs, or rationale explanations within instruction files that would be noise to the agent.
Auto Memory is a persistent memory system stored at ~/.claude/projects/<project-derived-path>/memory/. The MEMORY.md file at the root of this directory is loaded at session start (first 200 lines or 25KB, whichever comes first). Topic-specific files in the memory/ directory are loaded on demand. This two-tier structure lets memory scale beyond what would fit in a single preloaded file while ensuring the most critical facts are always available.
Auto memory is shared across all worktrees in the same git repository. This is an important architectural choice: worktrees provide filesystem isolation for changes but not for the accumulated knowledge the agent has built about the project. An agent that has learned important facts about a codebase in one worktree does not lose that knowledge when working in another.
The InstructionsLoaded hook event, combined with the load_reason field in its payload (session_start, nested_traversal, path_glob_match, include, compact), gives you complete observability into how context is being assembled for any given session. This is essential for debugging cases where an agent seems to be following instructions from an unexpected source.
For teams using O-mega's agent platform, this layered context system maps directly to the context isolation architecture described in our AI agents deep dive. The principle that each level of the abstraction stack should only see context relevant to that level is implemented in the SDK through the CLAUDE.md scope hierarchy.
7. Permission Modes: Six Levels of Autonomy
The permission model is the SDK's most nuanced subsystem because it addresses a problem that has no clean technical solution: how much should an autonomous system be trusted? The answer is always context-dependent, and the six permission modes represent six calibrated points on the trust spectrum.
default mode auto-approves all file reads but prompts before every edit, write, or command execution. This is appropriate for interactive sessions where a human is watching and approving each action. It is the most conservative safe default for a system that has access to the filesystem.
acceptEdits mode extends auto-approval to file edits, file writes, and a curated list of common filesystem Bash operations: mkdir, touch, rm, mv, cp, sed, awk, echo, cat redirection, and similar. It does not auto-approve arbitrary Bash commands. The distinction matters: rm -rf ./node_modules is probably safe in a development context. rm -rf / is not. acceptEdits mode handles the common case without opening the full shell surface.
plan mode is read-only: Claude can explore files and construct a plan but cannot modify anything. The plan architecture works by routing all codebase research through a read-only Plan subagent, preventing the main session from executing any writes even accidentally. This mode is appropriate when you want to validate a proposed approach before committing to it.
auto mode introduces a background classifier model that reviews each proposed action before it executes. The classifier runs silently in parallel with the agent's tool calls, applying Anthropic's safety policy and returning a classification of safe/unsafe. Actions classified as unsafe are denied and the agent is notified. This mode is the most sophisticated because it uses AI judgment to make permission decisions rather than static rules. It requires Max, Team, or Enterprise plan; a supported model (currently Claude Opus 4.7, Claude Sonnet 4.6 on Team/Enterprise/API); and Anthropic API access (not Bedrock or Vertex).
dontAsk mode auto-denies everything that is not in the explicit allow rules. This is the correct mode for fully automated pipelines where no human is present to approve unexpected requests. Any tool call not covered by a pre-approved rule fails silently and the agent must work within those constraints.
bypassPermissions mode auto-approves everything, including operations on protected paths. Two absolute circuit breakers remain even in this mode: rm -rf / and rm -rf ~. This mode is appropriate for air-gapped testing environments or controlled evaluation harnesses where the risk surface is fully contained.
Protected paths that resist even acceptEdits mode are worth enumerating explicitly: .git, .vscode, .idea, .husky, .gitconfig, .gitmodules, .bashrc, .zshrc, .profile, .ripgreprc, .mcp.json, .claude.json, and the .claude directory itself (with specific exceptions for .claude/commands, .claude/agents, .claude/skills, and .claude/worktrees). These paths are protected because accidental modification would silently corrupt core developer tooling or agent configuration in ways that are difficult to diagnose.
Permission rules use a specific matcher syntax that applies to each tool. Bash(npm run *) matches any npm run command. Read(~/secrets/**) matches any file read from the secrets directory. Edit(/src/**) matches edits within the src directory. WebFetch(domain:example.com) matches fetches from that domain. Agent(Explore) matches invocations of the Explore subagent type. Skill(deploy *) matches any skill whose name starts with "deploy". These rules compose: you can pre-approve specific operations while requiring approval for everything else, enabling fine-grained automation with a narrow and auditable trust surface.
8. MCP Integration: Extending the Tool Surface
The Model Context Protocol is Anthropic's standard for extending AI systems with external tools and data sources. The Claude Agent SDK treats MCP as a first-class extension mechanism, and understanding how MCP integrates with the SDK at the architectural level is important for anyone building production agent systems.
MCP servers connect to the SDK via stdio or HTTP (streamable HTTP being the preferred production transport). Configuration lives in .mcp.json at the project root, in settings files, or inline in agent definitions:
{
"playwright": {
"type": "stdio",
"command": "npx",
"args": ["@playwright/mcp@latest"]
},
"memory": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-memory"]
}
}
MCP tools appear in the agent's tool suite with a naming convention of mcp__<server-name>__<tool-name>. This namespace structure is important for permission rules and hook matchers: mcp__playwright__.* matches all playwright MCP tools. mcp__memory__.* matches all memory server tools. The double-underscore separator is a convention that allows the SDK to distinguish MCP tools from native tools and to enforce server-level policies without enumerating every tool name.
Deferred tool loading is the SDK's approach to the MCP tool count problem. A production MCP setup might have hundreds of tools across multiple servers. Preloading all their schemas at session start would consume a significant portion of the context window before any work begins. The SDK defers schema loading by default: tool names are listed in the agent's awareness but schemas are not expanded until the ToolSearch tool explicitly fetches them. This lazy loading architecture means the context cost of MCP is proportional to the tools actually used in a given session, not the total catalog size.
Each subagent can have its own MCP server configuration. An agent definition's mcpServers field accepts either inline server definitions (scoped to that subagent's execution) or string references to already-configured servers (inheriting from the parent session). This per-agent MCP configuration enables sophisticated architectures: a research subagent configured with web search and news MCP servers, a code execution subagent configured with a local sandbox MCP server, and a communication subagent configured with Slack and email MCP servers, all operating independently with tool access scoped to their specific function.
MCP resources (distinct from MCP tools) are data assets that servers expose alongside their function-call interface. A file server might expose its files as resources. A database server might expose table schemas. ListMcpResourcesTool and ReadMcpResourceTool provide the interface for agents to discover and consume these resources.
The reconnectMcpServer and toggleMcpServer methods on the TypeScript Query object enable runtime MCP server management. This is important for production applications that need to handle MCP server failures gracefully: rather than terminating the entire agent session when one MCP server becomes unavailable, the application can detect the failure and disable that server while continuing with the remaining tools. Our guide to building MCP servers covers the server implementation side of this integration.
9. Agent Teams: Coordinated Multi-Agent Systems
Agent teams represent a different coordination model from the subagent hierarchy. Subagents report to a parent. Teammates coordinate with each other. The architectural difference is significant: subagents execute tasks assigned by their parent and report back; teammates share a task list, discover available work, and self-assign tasks without requiring a central coordinator to micromanage execution order.
The agent teams system is enabled via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 and becomes available through three new tools: TeamCreate, TeamDelete, and SendMessage.
TeamCreate instantiates a named team with a set of defined teammates. Each teammate is a full Claude Code session running as a separate process, with its own context window, model configuration, and tool set. The team's shared task list is stored at ~/.claude/tasks/{team-name}/ and provides the coordination surface that teammates read from and write to. TeamDelete disbands the team and cleans up all associated processes and task files.
SendMessage enables direct peer-to-peer communication between teammates. Unlike the parent-child channel that Agent creates, SendMessage targets a specific teammate by agent ID and delivers a message that appears in that teammate's context window. This direct channel enables richer coordination patterns: a research teammate can message a writing teammate with specific findings, a code reviewer teammate can message the implementing teammate with specific concerns, and teammates can surface blockers to each other without routing everything through the team lead.
The coordination dynamics differ from subagents in ways that matter for production systems. Subagents are cheap to spawn and disposable: they execute a task and terminate. Teammates are persistent: they maintain context across multiple tasks and can accumulate domain expertise within a session. A teammate that has spent 30 minutes researching a codebase understands it in a way that a freshly-spawned subagent does not. The trade-off is cost: teammates maintain full context windows across their entire lifetime, while subagents pay context cost only for their specific task.
The hook events specific to teams (TeammateIdle, TaskCreated, TaskCompleted) enable sophisticated quality control. A Stop hook on a teammate that returns exit code 2 injects feedback into that teammate's context and keeps them working: this is the mechanism for implementing automated review loops where a reviewer hook evaluates each teammate's output and sends it back for revision if it does not meet quality criteria. TaskCreated hooks with exit code 2 can prevent tasks from being added to the shared list, implementing task routing policies. TaskCompleted hooks with exit code 2 can block task completion until validation criteria are met, implementing acceptance criteria enforcement.
10. Worktrees and Isolation: Safe Parallel Execution
Worktree isolation solves a real engineering problem: how do you run multiple agents in parallel on the same codebase without their changes interfering with each other? The naive answer is to give each agent a separate repository clone. The SDK's answer is git worktrees, which are cheaper than full clones and integrate natively with the project's git history.
EnterWorktree creates a worktree at .claude/worktrees/<name>/ under the repository root, on a branch named worktree-<name>. The base branch defaults to origin/HEAD, configurable via the worktree.baseRef setting. The worktree has its own working directory state: changes made in the worktree do not affect the main checkout, and vice versa. ExitWorktree returns to the original directory.
The .worktreeinclude file at the project root handles an important edge case: gitignored files. When a worktree is created, files listed in .worktreeinclude are automatically copied into the worktree even if they are gitignored. This is specifically designed for .env files, local configuration, and other secrets that the agent needs but that must not be committed. Without this mechanism, agents in worktrees would consistently fail to find their required configuration.
Worktree cleanup behavior is automatic and conditional. If a subagent running in a worktree makes no commits, the worktree is removed when the subagent exits. If the subagent makes commits, the worktree and its branch are preserved for the parent to review and merge. This conditional cleanup prevents both resource accumulation from discarded experimental work and the loss of work that should be preserved.
The WorktreeCreate and WorktreeRemove hook events are particularly interesting for teams using non-git VCS. The WorktreeCreate hook's stdout is interpreted as the worktree directory path: if your hook creates an SVN working copy or a Perforce workspace, it can return the path to that workspace and the SDK will use it as the worktree. This mechanism makes the worktree abstraction VCS-agnostic, with the hook system as the adapter layer.
For fully automated subagent execution, the isolation: "worktree" parameter in the Agent tool invocation is the one-line way to get this isolation without any manual EnterWorktree/ExitWorktree management. The SDK creates the worktree, runs the subagent, and handles cleanup based on whether commits were made. This makes worktree isolation accessible for standard delegation patterns, not just explicitly managed worktree sessions.
11. Scheduling, Cron, and Remote Triggers
The scheduling layer addresses a real operational need: agents that should do things on a schedule, not just on demand. The SDK's scheduling architecture has two distinct implementations with different characteristics, appropriate for different deployment scenarios.
In-session scheduling via the Cron tools is the lightweight option. Tasks scheduled with CronCreate run within the current session's process. They survive session resumption but expire after 7 days (for recurring tasks) and are bounded to 50 tasks per session. The minimum interval is 1 minute. Jitter is applied at hour boundaries (up to 30 minutes for recurring tasks, 90 seconds for one-shot tasks at the hour mark) to prevent synchronized load spikes across multiple agent sessions.
The practical use case for in-session scheduling is monitoring and maintenance within a running development session. A task that checks the test suite every 15 minutes and reports failures, a task that commits current work-in-progress every hour, or a task that fetches the latest upstream changes every 30 minutes are all well-suited to in-session scheduling. They are tied to the developer's active session and don't need to run when nobody is working.
Remote Routines via RemoteTrigger are the production option. Routines run on Anthropic-managed cloud infrastructure, independent of any running session. They survive process restarts, holiday closures, and laptop lid closures. The trigger types available (scheduled, HTTP endpoint, GitHub events) cover the three most common automation patterns in software development: time-based, webhook-based, and event-based.
The HTTP trigger format deserves attention because it enables external systems to invoke agent sessions as part of larger automation pipelines:
curl -X POST https://api.anthropic.com/v1/claude_code/routines/trig_01ABC.../fire \
-H "Authorization: Bearer sk-ant-oat01-xxxxx" \
-H "anthropic-beta: experimental-cc-routine-2026-04-01" \
-H "Content-Type: application/json" \
-d '{"text": "Process the new batch of customer feedback from today"}'
The text field in the HTTP trigger payload is injected as additional context into the Routine's prompt. This makes Routines a lightweight workflow endpoint: your CI system can POST to a Routine to trigger a code review agent, your support system can POST to a Routine to trigger a documentation update agent, and your data pipeline can POST to a Routine to trigger a report generation agent.
GitHub event triggers are the most sophisticated option: a Routine can fire when a PR is opened, when tests fail, or when a specific label is applied. This integration makes Routines a natural CI/CD actor that can participate in code review workflows, automated fix generation, and quality gate enforcement without any additional infrastructure.
Our Claude Agent SDK cost guide covers the credit and pricing implications of Routines in detail. The remote execution model has different cost characteristics from local execution, and the per-run credit accounting matters for production deployments with frequent triggers.
12. The SDK API: Programmatic Agent Construction
The Python and TypeScript SDK APIs expose the agent runtime for programmatic use in application code. The query() function is the primary entry point, returning an async iterator of structured message objects.
The ClaudeAgentOptions class (Python) and Options interface (TypeScript) configure every aspect of the agent session. The full parameter set is extensive, and most production applications will use a small subset. But understanding the full surface area reveals the SDK's design intentions.
The effort parameter accepts "low", "medium", "high", "xhigh", or "max". This parameter controls the extended thinking budget, effectively setting how much compute the agent invests in difficult reasoning before acting. Higher effort produces better decisions on complex tasks at proportionally higher cost. The CLAUDE_EFFORT environment variable is available within skill content for effort-aware prompting.
The fork_session parameter creates a new session that inherits the full conversation history of the parent session but gets its own session_id. This is the mechanism for branching: you can explore two different approaches from the same conversation state by forking and running separate query() calls on each fork. Unlike resumption (which continues a single linear history), forking creates a tree structure of sessions.
The agents parameter passes agent definitions inline without requiring .claude/agents/ files on disk. This is the correct approach for programmatically-generated agent specifications where the definition changes per-invocation or per-user. SDK-defined agents have identical capabilities to file-defined agents; the registration mechanism is just different.
The session_store parameter is available for applications that need to persist session state somewhere other than ~/.claude/projects/. A custom SessionStore implementation lets you store session transcripts in a database, an object store, or any other persistence backend. The session_store_flush parameter controls whether writes are batched or immediate.
The hooks parameter in ClaudeAgentOptions attaches hooks programmatically, in addition to any hooks configured in settings files. SDK-defined hooks follow the same event/matcher pattern as settings-defined hooks but are registered at call time rather than configuration time. This is the correct approach for application-specific hooks that depend on per-request context unavailable at configuration time.
The can_use_tool parameter in the Python SDK accepts a callable that receives the tool name and input and returns a boolean permission decision. This is the programmatic equivalent of the hook system's PreToolUse event: a Python function that can inspect tool calls and make binding allow/deny decisions inline. It is simpler than writing a hook script for cases where the permission logic lives naturally in Python.
The include_partial_messages parameter enables streaming of in-progress assistant responses as StreamEvent objects. This is the mechanism for building real-time UIs that show the agent's output as it is generated rather than waiting for each full turn to complete. StreamEvent objects carry raw API stream events, including thinking blocks when extended thinking is active. An application displaying a live agent workspace would use include_partial_messages=True and render StreamEvent objects as they arrive, then display the complete AssistantMessage when the turn is done.
The include_hook_events parameter includes hook execution events in the message stream. This is the observability primitive for applications that need to know when hooks fired, what decisions they made, and whether any hooks blocked tool executions. Without this parameter, hook activity is invisible to the consuming application. With it, the application can build complete audit trails, debug hook logic, and surface hook-based policy decisions to users in the UI.
Session forking with fork_session=True deserves particular attention for testing and evaluation workflows. The pattern is: (1) run a deterministic setup phase to reach a known state, (2) capture the session ID, (3) fork multiple sessions from that ID, (4) run different prompt variations on each fork in parallel, (5) compare results. This is significantly more efficient than running the same setup phase from scratch for every variant. For automated evaluation of agent behavior (red-teaming, prompt regression testing, A/B comparison of system prompt variants), forking turns a serial evaluation into a parallel one without sacrificing the reproducibility of starting from identical state.
import anthropic
from claude_agent_sdk import query, ClaudeAgentOptions
async def run_code_review(pr_diff: str, session_id: str | None = None):
options = ClaudeAgentOptions(
system_prompt="You are a code review expert. Review the provided diff for security vulnerabilities, logic errors, and style violations.",
allowed_tools= ["Read", "Grep", "Glob", "WebSearch"],
effort="high",
max_turns=30,
max_budget_usd=2.00,
resume=session_id,
can_use_tool=lambda tool, input: tool != "Bash", # Block shell execution
)
result_session_id = None
async for message in query(prompt=pr_diff, options=options):
if message.type == "result":
result_session_id = message.session_id
print(f"Cost: ${message.total_cost_usd:.4f}")
elif message.type == "assistant":
for block in message.content:
if block.type == "text":
print(block.text)
return result_session_id
The TypeScript Query object exposes additional runtime control methods beyond the async iterator. interrupt() gracefully stops the current turn. rewindFiles(userMessageId) reverts all file changes made since a specific message, with optional dry-run mode to preview what would be reverted. setPermissionMode() and setModel() change session configuration at runtime. supportedModels(), supportedAgents(), and mcpServerStatus() provide introspection into the session's current capabilities.
For Claude Code's pricing and plan requirements for SDK access, our Claude Code pricing guide covers the current tier structure. API access, Max plans, Team plans, and Enterprise plans each unlock different SDK capabilities, particularly around auto permission mode and Routines.
The official Claude Code Anthropic demo for security vulnerability detection shows exactly how these SDK capabilities combine in a real workflow:
13. Why the Architecture Wins
The first-principles argument for why this particular architecture is effective requires going back to the structural problem that AI coding tools are solving: how do you make intelligence reliable when the intelligence itself is probabilistic? The Claude Agent SDK's answer is layered determinism. Every non-deterministic element (the LLM's decisions) is wrapped by deterministic structure (the hook system, the permission model, the session transcript). You cannot make Claude's choices deterministic, but you can instrument every choice, intercept every action, audit every output, and roll back any file change. The result is a system whose behavior is reproducible, auditable, and correctable even though its core decision-maker is probabilistic.
This is the architectural claim that competing frameworks struggle to match. Raw API loops give you the LLM without the structure. LangChain and similar frameworks give you structure but at the application layer, where it is fragile and hard to enforce. The SDK embeds the structure into the subprocess, where it is consistent across all callers and enforceable at the binary level.
The subprocess architecture specifically protects against a class of failures common in application-layer agent frameworks: when the agent framework crashes, it takes the agent state with it. The SDK's subprocess model means the agent loop runs in a process that is architecturally separate from your application. The transcript is on disk. The session ID is persistent. If your application crashes, you can restart it and resume the session. The agent does not lose its work because the application process died.
The CLAUDE.md hierarchy solves the personalization problem that plagues general-purpose AI systems. An agent that knows nothing about your codebase, your conventions, and your operational context is useful for isolated tasks. An agent that has deeply internalized your architecture (because your architecture is documented in layered CLAUDE.md files it always loads) is a fundamentally different tool. The SDK makes context accumulation a first-class artifact of the filesystem rather than a per-request prompt engineering task.
The 40-tool surface area reflects a calculated decision about where the agent's action boundary should be. Every tool that exists represents a class of action that Anthropic decided was safe and useful to give agents by default. Every tool that does not exist (there is no direct database write tool, no SSH tool, no container management tool) reflects a decision about where the boundary of reasonable default capability lies. The MCP extension mechanism then lets organizations push that boundary as far as their risk tolerance allows, with the hook system providing the policy enforcement layer above it.
The worktree isolation architecture reflects understanding that the most expensive agent failures are not API errors or wrong answers: they are unintended filesystem mutations. An agent that corrupts a codebase in ways that are difficult to detect and expensive to reverse is a tool that engineering teams will not trust. Worktrees make corruption containable: everything the agent did in the worktree can be inspected before it reaches the main branch, and the rewindFiles API in the TypeScript SDK makes per-session rollback possible for cases where worktrees are not used.
For production AI agent platforms that orchestrate Claude at scale, platforms like O-mega build on top of this architecture to provide the multi-agent workforce coordination, persistent memory, and cross-session learning that turn the SDK's capabilities into operational business infrastructure. The SDK provides the execution layer; orchestration platforms provide the strategic coordination and observability layer above it.
The SDK's most consequential architectural bet is the one that is easiest to overlook: the agent loop is not your code. This is a transfer of control that feels like giving up power but is actually the acquisition of reliability. When the loop is your code, you are responsible for its correctness. When the loop is Anthropic's subprocess, you inherit the correctness guarantees that Anthropic has built into that loop: the context window management, the compaction logic, the tool execution retry handling, the transcript persistence. You pay for that reliability with some loss of flexibility, and you recover the flexibility you need through the hook and permission systems.
Our Claude Cowork guide examines how Claude's broader product ecosystem connects to the SDK's execution model. The product layer and the SDK layer are architecturally related: the same subprocess, the same tools, the same hooks, the same permission model.
The Managed Agents architecture, which extends the SDK's capabilities for enterprise deployments, builds on the same substrate:
This architecture, where a standardized execution substrate (the SDK subprocess) sits beneath an increasingly rich coordination and enterprise layer (Managed Agents, teams, Routines), is how Anthropic is positioning Claude as a platform rather than a product. The SDK is the documented API surface of that platform. Every tool, every hook event, every permission mode, every parameter in ClaudeAgentOptions is an interface to the platform's capabilities. The bet is that teams who understand these interfaces at the level of depth this guide has provided will build significantly more reliable and capable agent systems than teams who treat the SDK as a convenient wrapper. That bet, based on the architectural soundness of the design, seems reasonable to take.
Financial services teams deploying multi-agent coordination at scale represent one of the most demanding production scenarios:
For teams moving from AutoGen or other frameworks to the Claude Agent SDK, our AutoGen alternatives guide provides a structured comparison of the migration trade-offs. The concurrency model, the context handling, and the permission architecture are all different enough that migration deserves careful planning.
Yuma Heymans, Founder and CEO of O-mega and co-founder of HeroHunt.ai, has been building and deploying production AI agent systems since the early agent frameworks emerged. The architectural analysis in this guide reflects patterns learned from operating agents at scale across multiple production workloads. Follow at @yumahey for ongoing analysis of the agent infrastructure landscape.
This guide reflects the Claude Agent SDK architecture as of May 2026. The SDK evolves rapidly: tool counts, permission behaviors, and API parameters change with each release. Verify current details against the official documentation before building production systems.