The practical builder's guide to shipping software on Anthropic's most powerful model.
On June 9, 2026, Anthropic shipped Claude Fable 5, the first generally available model in a new tier it calls Mythos-class, sitting a full rung above Opus. Three days earlier the company had publicly warned that its frontier systems were getting dangerous enough to matter to national security, and then it released a safety-classified version of that same system to anyone with an API key - TechCrunch. For people who build software, the headline is simpler: the most capable coding and agent-building model ever sold to the public is now a single line of configuration away.
But here is the problem most builders hit immediately: there is no longer one way to build with Claude. There are at least six distinct surfaces, each with its own pricing, its own failure modes, and its own right answer. A founder who wires up a raw API call when they needed a managed agent will reinvent infrastructure for weeks. A team that reaches for the heaviest agent runtime when a single request would do will pay ten times too much and ship half as fast. The model is the easy part. Choosing the surface is the hard part.
This guide breaks down what Claude Fable 5 actually is, the six surfaces you can build apps on top of it, the real pricing of each, the production patterns that separate working software from expensive demos, and how it fails so you can design around it. It starts at the highest level (what changed and why it matters) and then goes deep into each layer, the economics, the competition, and where the whole field is heading. The audience is anyone building a product, whether you write the code yourself or describe what you want and let an AI workforce build it for you.
Contents
- What Claude Fable 5 actually is
- The six ways to build apps with Claude
- Surface one: the Claude API and the Messages endpoint
- Surface two: workflows and the agentic loop
- Surface three: the Claude Agent SDK
- Surface four: Claude Code in the terminal and CI
- Surface five: Claude Managed Agents
- Surface six: MCP, the layer every app now plugs into
- Claude on the clouds, and where it does not reach
- The economics: what it actually costs to build
- Production patterns that separate apps from demos
- How it fails: limits, refusals, and prompt injection
- The competitive landscape in mid-2026
- The market, the money, and where this is going
- Choosing your surface: a decision framework
The six build surfaces, scored
Before the deep dives, here is the whole landscape in one table. The six surfaces below are scored on what a builder actually cares about when picking where to start: how fast you get a working app, how much you can customize, how little infrastructure you have to run yourself, and how predictable the bill is. The scores are a general accessibility ranking, not a verdict that the top row is always right. The correct surface depends entirely on your task, and half of this guide is about matching the two.
| # | Surface | What It Does | Time to First App (30%) | Control (25%) | Managed Infra (25%) | Cost Control (20%) | Final |
|---|---|---|---|---|---|---|---|
| 1 | Claude API | One stateless endpoint for every Claude capability | 8 - one POST /v1/messages, working in minutes | 10 - you own the loop, tools, and every byte | 4 - you host the loop and any tool execution | 10 - $10/$50 per MTok, metered per token, no runtime fee | 7.9 |
| 2 | Claude Code | Scriptable coding agent for terminal and CI | 9 - install, point at a repo, prompt | 6 - the agent drives, you steer | 7 - Anthropic runs the loop, you supply the machine | 6 - a 4-hour session is ~$25 cached vs ~$80 uncached | 7.15 |
| 3 | Managed Agents | Anthropic runs the loop and hosts the sandbox | 7 - create an agent, start a session, stream | 6 - config lives on the agent, Anthropic runs it | 10 - Anthropic runs the loop and hosts the container | 5 - $0.08 per session-hour plus tokens | 7.1 |
| 4 | MCP | Open standard wiring tools and data into any agent | 6 - it is the plumbing, not the app itself | 8 - open standard, 10,000+ servers, swap freely | 5 - you connect and run servers | 9 - no model premium, reused across every surface | 6.85 |
| 5 | Agent SDK | The Claude Code engine, as a library you embed | 7 - scaffolding ships, you wire the compute | 9 - the same agent loop as Claude Code, fully yours | 4 - you host compute and tool execution | 7 - token rates plus your own infrastructure | 6.75 |
| 6 | Claude on cloud | Claude via AWS, Vertex, and Microsoft Foundry | 6 - cloud identity and SDK setup first | 7 - Messages plus tools, no Managed Agents there | 6 - cloud-operated infra, your own agent loop | 7 - cloud billing, 100,000+ on Bedrock alone | 6.45 |
The four criteria and their weights: Time to First App (30%) captures how quickly a builder gets something running, the single biggest predictor of whether a project survives. Control (25%) is how much you can customize the loop, the tools, and the prompt. Managed Infra (25%) rewards surfaces where Anthropic runs more of the stack so you run less. Cost Control (20%) is how transparent and bounded the bill is. The final score is the weighted average, rounded to one decimal, and the rows are sorted from highest to lowest. Read it as a starting-point map, then read the sections to understand why a 6.45 surface like Bedrock can still be the only correct answer for a regulated enterprise.
1. What Claude Fable 5 actually is
The most important thing to understand about Claude Fable 5 is that it is not just a faster Opus. Anthropic created a new model tier for it, branded Mythos-class, positioned explicitly above the Opus line - Anthropic. Fable 5 is the public, safety-classified twin of a more capable internal model called Mythos 5, which is restricted to the company's defensive-cybersecurity program. That lineage is the reason Fable 5 ships with conservative safeguards and a price tag that puts it in its own bracket. For builders, the practical translation is that you now have access to a model whose ceiling on hard, long-horizon software tasks is meaningfully higher than anything previously sold to the public.
The shape of the model matters as much as the marketing. Fable 5 is available under the model ID claude-fable-5, with a 1 million token context window and 128K maximum output tokens per request - Anthropic docs. Pricing is $10 per million input tokens and $50 per million output tokens, which the company notes is less than half the price of the earlier Mythos Preview - Anthropic. The full 1M context comes at standard pricing with no long-context surcharge, so a 900K token request is billed at the same per-token rate as a 9K one. That single design choice quietly removes one of the oldest reasons to chunk and summarize before sending.
The request surface is where a lot of builders will trip if they carry old habits forward. Fable 5 uses adaptive thinking only. The old pattern of setting a fixed thinking budget with budget_tokens is gone and returns a 400 error, as do the sampling parameters temperature, top_p, and top_k. On Fable 5 specifically, even an explicit thinking: {type: "disabled"} returns a 400, so the way to run without thinking is to omit the parameter entirely. The way you steer cost and depth now is the effort parameter, which accepts low, medium, high (the default), xhigh, and max. For coding and agentic work, xhigh is the recommended setting, and a minimum of high is sensible for anything intelligence-sensitive.
A minimal Fable 5 call in Python looks like this, and it is worth internalizing because every higher surface wraps some version of it:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-fable-5",
max_tokens=16000,
thinking={"type": "adaptive"},
output_config={"effort": "xhigh"},
messages= [{"role": "user", "content": "Refactor this module for testability..."}],
)
The other surprise in the request surface is what comes back. By default, Fable 5 returns empty thinking blocks. The reasoning still happens and still streams as a content block, but the text is omitted unless you opt in with thinking: {type: "adaptive", display: "summarized"}. If you stream Claude's reasoning to users as a progress indicator, the default will look like a long silent pause before output begins, so the summarized setting is how you restore visible progress. This is a silent change with no error attached, which makes it exactly the kind of thing that ships to production looking broken.
Capability-wise, Fable 5 supports the full modern toolkit at launch: code execution, programmatic tool calling, context editing, server-side compaction, vision, and the memory tool, plus the beta Task Budgets feature for telling an agent how many tokens it has for a whole loop. On the hardest public coding benchmark, the harder SWE-Bench Pro set, Fable 5 leads the field. The vendor-reported numbers put it at 80.3%, ahead of Opus 4.8 at 69.2%, GPT-5.5 at 58.6%, and Gemini 3.1 Pro at 54.2% - The Decoder. Those are self-reported scaffold numbers rather than a standardized third-party run, so treat them as directional rather than gospel, a nuance we return to in the competitive section. For a deeper benchmark breakdown, our Fable 5 and Mythos 5 benchmark guide goes through the full leaderboard methodology.
It helps to understand what Mythos-class actually means structurally, because the name is doing real work rather than marketing. Fable 5 is the publicly releasable version of an internal model, Mythos 5 (model ID claude-mythos-5), which carries the same raw capability without the safety classifiers and is restricted to Anthropic's defensive-cybersecurity partners - CNBC. In other words, the tier above Opus is not a bigger Opus, it is a model Anthropic considered dangerous enough to gate, with a public twin shipped behind a wall of classifiers. For a builder, that lineage explains both the higher price and the conservative refusals: you are renting the safe edge of a system the company itself treats as a national-security-relevant asset, which is the through-line of the Project Glasswing program the whole Mythos line grew out of.
The single headline result Anthropic chose to lead with says a lot about who Fable 5 is for. Stripe pointed it at a 50-million-line Ruby codebase and finished a migration in one day that the company estimated would take a full team more than two months - Anthropic. That is not a chatbot benchmark, it is an engineering-org benchmark. It tells you the model is being positioned for the largest, most boring, most expensive kind of software work, the kind that previously justified hiring. Understanding why that matters for your own app is the rest of this guide.
2. The six ways to build apps with Claude
Here is the structural insight that the official documentation circles around but rarely states plainly: every Claude capability is a feature of a single endpoint, and every higher surface is a different amount of that endpoint's loop being run for you. Tools, structured outputs, code execution, and web search are not separate products. They are options on POST /v1/messages. The difference between calling the API yourself and using a managed agent is not what the model can do, it is who runs the loop that calls it, executes its tools, feeds the results back, and decides when it is done. Once you see the surfaces as points on a single axis from "you run everything" to "Anthropic runs everything," choosing between them stops being a branding question and becomes an engineering one.
That axis has six well-defined stops in mid-2026, and they form a ladder. At the bottom is the raw API, where you orchestrate every step. One rung up is the workflow pattern, where you still control the loop but follow a known shape. Above that sits the Agent SDK, which packages the loop, the tools, and the context management that power Claude Code into a library you embed on your own compute. Then comes Claude Code itself, the scriptable coding agent. Above that is Managed Agents, where Anthropic runs the loop and hosts the sandbox. And threading through all of them is MCP, the open standard that wires external tools and data into any of these surfaces. Cloud distribution through AWS, Vertex, and Microsoft Foundry is a seventh consideration that cuts across the whole ladder.
The reason this ladder matters is that altitude has a cost in both directions. Climb too high for the task and you pay for managed infrastructure and lose the fine control you needed. Stay too low and you rebuild scaffolding that Anthropic already wrote, tested, and maintains. The discipline Anthropic itself recommends is to start at the simplest tier that meets the need and escalate only when the task genuinely demands it - Anthropic Engineering. A single classification or extraction call should be a single API call, not an agent. An open-ended, multi-step, hard-to-specify job with recoverable errors is where the agent tiers earn their cost.
There is also a tier above the ladder that does not appear in Anthropic's documentation because it is not Anthropic's: the describe-it platforms. Tools in this category let a non-technical builder state what they want in plain language and have an AI workforce, running on the latest Claude models underneath, build and operate the whole product. O-mega sits here, generating a company's website, app, billing, content, and admin from one conversation rather than from code you write against an endpoint. It is the same engine seen from the other end of the telescope, and it belongs in the same mental map as the six surfaces even though you never touch a model ID. We cover the build-fast end of this spectrum in detail in our guide to building products with AI fast.
The practical way to use this section is as a routing table for the rest of the guide. If your app makes one model call per user action, sections 3 and 4 are your home. If you are building a long-running autonomous worker, sections 5 through 7 are where you live. If you are integrating Claude into existing tools and data, section 8 on MCP is unavoidable. And if you are in a regulated enterprise on a specific cloud, section 9 decides what is even available to you. The model is the same in every case. The surface is the decision.
3. Surface one: the Claude API and the Messages endpoint
The base layer is a single stateless HTTP endpoint, and almost everyone underestimates how much app you can build with nothing else. POST /v1/messages takes a model, a message list, and a token cap, and returns a response. Because it is stateless, you send the full conversation history every turn, which sounds inefficient until you realize it is also what makes the API trivially horizontal: there is no session to lose, no server affinity, no state to corrupt. Classification, summarization, extraction, question answering, and content generation are all just shaped prompts against this one endpoint, and for a huge fraction of real products that is the entire integration.
What turns the endpoint from a chat box into an app platform is tool use. You describe functions as JSON schemas, Claude decides when to call them, and either you execute them and feed the results back, or you let the SDK's tool runner handle that loop for you. The model never runs your code. It emits a structured request, your harness runs the function, and the result goes back as a tool result message. This separation is the whole security model: because Claude only ever asks, your code decides what is actually allowed to happen. For deeper-leverage cases, server-side tools like code execution and web search run entirely on Anthropic's infrastructure, so you declare them and the model uses them without you executing anything client-side.
The third pillar of the base layer is structured outputs, which is what makes the API safe to put in front of downstream systems. Instead of hoping Claude returns clean JSON, you constrain the response to a schema with output_config.format, or in Python you use client.messages.parse() with a Pydantic model and get a validated object back. The old top-level output_format parameter is deprecated across all models, so new code should use output_config.format. The same mechanism applies to tools through strict: true, which guarantees the tool parameters match your schema. Here is the extraction pattern that replaces a hundred lines of brittle regex:
from pydantic import BaseModel
class Contact(BaseModel):
name: str
email: str
plan: str
response = client.messages.parse(
model="claude-fable-5",
max_tokens=2000,
output_config={"effort": "low"},
messages= [{"role": "user", "content": "Jane Doe (jane@co.com) wants Enterprise."}],
output_format=Contact,
)
contact = response.parsed_output # a validated Contact instance
A practical note on effort that saves real money at this layer: a single extraction or classification call does not need to think hard, so effort: "low" is correct and cheaper, while a genuine reasoning task wants high or xhigh. The effort level does not change the per-token rate, it changes how many tokens the request consumes, so the lever is real but indirect. The other base-layer discipline is streaming. For any request where max_tokens is large (above roughly 16,000), the SDK will refuse a non-streaming call because the connection would likely time out, so you stream and use get_final_message() to collect the whole response without handling individual events.
The reason to master this layer before reaching for anything heavier is that it is where cost, latency, and reliability are most controllable. You see exactly what you send and what you pay, there is no agent wandering off, and the failure modes are simple HTTP errors. Our walkthrough on building a Claude chatbot from scratch is essentially an extended tour of this surface, and most production Claude apps never need to leave it. When they do, it is usually because the task became genuinely open-ended, which is the line that separates a workflow from an agent.
4. Surface two: workflows and the agentic loop
A workflow is the step between a single call and a full agent, and it is the most underused pattern in the entire stack. The defining trait of a workflow is that you control the control flow. The path is known in advance: classify, then route, then summarize; or generate, then verify, then revise. Claude does the cognitive work at each node, but your code decides the sequence. This is enormously valuable because deterministic control flow is debuggable, testable, and cheap to reason about, while a model deciding its own trajectory is none of those things. Anthropic's own engineering guidance names five workflow shapes (chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer) precisely because most "agent" problems are actually workflow problems wearing a costume - Anthropic Engineering.
The agentic loop is the next escalation, and it is where you hand the model the steering wheel. In a manual loop you call the API, check whether the response wants a tool, execute it, append the result, and call again, repeating until the model stops asking for tools. The SDK's tool runner automates exactly this so you do not write the loop by hand, but the manual version is worth knowing because it is where you insert the things that matter in production: human approval gates before destructive actions, custom logging, conditional execution, and budget caps. The structural skeleton is short:
messages = [{"role": "user", "content": user_input}]
while True:
response = client.messages.create(
model="claude-fable-5", max_tokens=16000,
tools=tools, messages=messages,
)
if response.stop_reason == "end_turn":
break
messages.append({"role": "assistant", "content": response.content})
results = [run_tool(b) for b in response.content if b.type == "tool_use"]
messages.append({"role": "user", "content": results})
The decision of whether to climb from a workflow to an agent should be made on four tests, not on enthusiasm. Anthropic frames them as complexity, value, viability, and cost of error, and they are worth applying literally before you build anything autonomous:
- Complexity - is the task genuinely multi-step and hard to specify up front?
- Value - does the outcome justify higher latency and token spend?
- Viability - is Claude actually good at this kind of task today?
- Cost of error - can mistakes be caught and recovered through tests, review, or rollback?
If any of those four is a "no," the honest answer is to stay at the workflow or single-call tier. An agent that wanders is only acceptable when the wandering is cheap to catch and cheap to undo. This is also where Fable 5's Task Budgets beta earns its place: you tell the model how many tokens it has for the entire loop, it sees a running countdown, and it self-moderates and wraps up gracefully instead of spiraling. That is distinct from max_tokens, which is a hard per-response ceiling the model never sees. For teams running genuinely long autonomous jobs, our guide to long-running coding agents and the companion piece on writing loops for AI coding agents go far deeper into keeping these loops stable over hours.
The reason workflows deserve more respect than they get is economic. A well-structured workflow with three cheap, scoped calls almost always beats one expensive agent that has to hold the whole problem in context and reason about its own trajectory. You pay less, you ship faster, and when something breaks you can point at the exact node. The agent tier is not a graduation you should aspire to. It is a tool you reach for only when the problem genuinely cannot be expressed as a known sequence of steps, which is rarer than the marketing implies.
5. Surface three: the Claude Agent SDK
The Claude Agent SDK, renamed from the Claude Code SDK in September 2025, is the most important surface that most builders have never heard of. It packages the exact agent loop, tool set, subagent system, and context-management machinery that power Claude Code, and exposes them as a library in Python and TypeScript so you can build your own long-running agents on compute you host - Claude Code docs. The pitch is that the hardest parts of an agent are not the model call, they are everything around it: gathering context efficiently, compacting history before it overflows, spawning subagents for parallel work, and verifying results. The Agent SDK is Anthropic handing you the version of those parts that it uses internally.
What you get out of the box is the difference between a weekend prototype and something that survives contact with real workloads. The loop follows a gather-context, take-action, verify rhythm, with agentic and semantic search for pulling in only the relevant context, automatic compaction so a multi-hour run does not blow the context window, and subagents that let a coordinator fan work out to specialists. Developer interest tracks the capability. One proxy for adoption, search demand for the Agent SDK, reportedly jumped from around 50 monthly searches in May 2025 to roughly 14,800 by April 2026, an enormous relative climb even if the absolute numbers are modest - Totalum. Treat that figure as illustrative rather than an official Anthropic metric, but the direction is unmistakable.
The trade-off you accept with the Agent SDK is that you host the compute and you execute the tools. Anthropic gives you the brain and the scaffolding, but the machine the agent runs on, the security boundary around its file access, and the operational burden of keeping it alive are yours. That is exactly the right deal for teams that need full control over where their agent runs and what it can touch, and exactly the wrong deal for teams that just want an agent to exist without operating it. A billing change worth flagging: subscription plans get a separate Agent SDK credit pool effective June 15, 2026, which changes the economics for anyone running the SDK heavily under a subscription rather than pure API billing - DevToolPicks.
The clearest signal of how seriously to take this surface is that it is the same engine Anthropic uses to write its own software. The company has said the majority of its new production code, on the order of 80%, is now authored by Claude through these tools - VentureBeat. When the maker of the model builds the model's tooling with the tooling, the tooling tends to be good. Our Agent SDK deep dive walks through the loop, the subagent patterns, and the context-management internals in far more detail than fits here, and it is the right next read if you have decided to host your own agents.
6. Surface four: Claude Code in the terminal and CI
Claude Code is the agent that broke into the mainstream, and its importance for app building is easy to miss because people think of it as a chat assistant in a terminal. It is actually a scriptable, headless coding agent that runs in your CLI and your CI pipeline, which means it is a building block, not just an IDE companion. You can invoke it non-interactively, point it at a repository, give it a task, and have it edit, test, and commit, all inside an automated pipeline. That headless mode is what turns it from a productivity tool into infrastructure: a CI job that fixes failing tests, a cron task that keeps dependencies current, a pipeline step that drafts a migration.
The adoption data is the strongest in the whole Claude ecosystem because it is tied to revenue and to named enterprise rollouts. Claude Code reportedly reached around $2.5 billion in annualized revenue by February 2026, having launched only in May 2025, and an estimated 4% of all public GitHub commits are now authored by it - SaaStr. Stripe deployed it to 1,370 engineers through a zero-config enterprise binary, and one team used it to migrate 10,000 lines from Scala to Java in 4 days, work estimated at roughly 10 engineer-weeks by hand - VentureBeat. With Fable 5 underneath, the ceiling on that kind of large-scale, mechanical-but-vast work rises again, which is exactly the Stripe 50-million-line migration story from section 1.
The economics of Claude Code are where builders get surprised, and they are worth understanding before you wire it into a pipeline that runs unattended. A long session can be expensive or cheap depending almost entirely on caching. One worked analysis puts a four-hour session at roughly $25 with prompt caching versus around $80 without it - CloudZero. That three-fold spread is not a rounding error, it is the difference between a sustainable automation and a budget incident. The reason is structural: Claude Code re-sends a large, stable context every turn, and caching that prefix is what makes long sessions affordable, a mechanism we unpack fully in section 10.
The honest limitation of Claude Code is that it is opinionated about being a coding agent, which is its strength and its boundary. It drives, you steer, and that is the right division of labor for code but the wrong one if you need an agent that does something other than write and run software. The control you give up is the price of the speed you gain, and for code that trade is usually correct. For the full pricing picture, plans, and how it compares to alternatives, our Claude Code pricing guide covers the tiers in depth, and our building software with AI guide places it in the wider tooling landscape.
7. Surface five: Claude Managed Agents
Managed Agents, launched in public beta on April 8, 2026, is the newest and most architecturally interesting tier, and the cleanest way to understand it is the phrase Anthropic uses: brain versus hands. With every surface below it, you run the agent loop, the container, or both. With Managed Agents, Anthropic runs the loop on its orchestration layer and hosts a per-session sandbox container where the agent's tools execute - Anthropic. The model thinks on Anthropic's side; the bash commands, file edits, and code run in a workspace Anthropic provisions. You create a persisted, versioned agent configuration once, then start sessions that reference it, and you drive each session by sending events and reading an event stream.
The architecture has a hard rule that trips up almost everyone on first contact: the agent is created once, the session every run. The model, system prompt, tools, MCP servers, and skills all live on the agent object, never on the session. The session only holds a pointer to the agent plus an environment and the conversation. If you find yourself calling agents.create() at the top of every request, you are accumulating orphaned agents and paying creation latency for nothing. The correct shape is to create the agent in a setup step, store its ID, and reference it on every session. Anthropic even recommends defining agents and environments as version-controlled YAML and applying them through the ant CLI, treating the agent as a checked-in config rather than a runtime call.
The pricing model is genuinely different from everything below it, and it is the main reason to think before you climb. Managed Agents bills standard token rates plus $0.08 per session-hour, metered to the millisecond but only while the session status is actually running, so idle and terminated time is free - Anthropic docs. Anthropic's own worked example: a one-hour Opus 4.8 session processing 50K input and 15K output tokens costs $0.705 total, dropping to $0.525 with caching. Run an agent around the clock and the runtime fee alone is roughly $58 per month before a single token, so this tier rewards bursty, event-driven work over always-on polling. The named early adopters tell the story of who it is for: Notion, Rakuten, and Asana, with Rakuten reportedly cutting task turnaround from 24 days to 5, a 79% reduction, after deploying agents across five business functions - MEXC.
One detail of Managed Agents that quietly shapes your security posture is how credentials work, because the sandbox is a deliberate trust boundary. The agent's MCP servers declare only a type, a name, and a URL, with no authentication inline. The actual secrets live in vaults that you attach to a session, and Anthropic injects them into outbound requests through a proxy after the request leaves the container, so code running in the sandbox (including anything the agent writes) cannot read or exfiltrate a vaulted credential even under prompt injection. That design is the direct answer to the injection failure mode in section 12: the most dangerous thing an agent can leak is a key, and Managed Agents structurally prevents the key from ever entering the place an attacker would run code. If you need a non-MCP secret, the pattern is to keep it host-side and answer the agent's tool call yourself rather than ever placing it in a prompt, since prompts persist in the session's event history.
The reason Managed Agents matters even if you never use it is that it defines the top of the operable ladder. Above it, you stop touching infrastructure entirely and start describing outcomes, which is the describe-it tier from section 2. Below it, you trade managed convenience for control and cost. The right place to sit depends on whether running an agent loop and a sandbox is a core competency you want or an undifferentiated burden you would rather rent. Our dedicated Managed Agents guide covers the full session lifecycle, the event stream, outcomes, and the multi-agent coordinator pattern that this section only gestures at.
8. Surface six: MCP, the layer every app now plugs into
The Model Context Protocol is not a way to build an app so much as the thing every Claude app now uses to reach the rest of the world, and it has quietly become the most consequential standard in the ecosystem. MCP is an open protocol for exposing tools, data, and prompts to any AI agent in a uniform way, so that the same Slack integration, the same database connector, or the same internal service works across Claude, other models, and dozens of clients without bespoke glue. Anthropic introduced it in late 2024, and by early 2026 it had crossed roughly 97 million monthly SDK downloads with more than 10,000 public servers in the ecosystem - DigitalApplied. That is not a Claude feature anymore, it is industry plumbing.
The governance milestone is the part that tells you MCP is permanent. In December 2025, Anthropic donated MCP to the Linux Foundation's new Agentic AI Foundation, co-founded with Block and OpenAI, which means the standard is now neutral ground rather than one vendor's lever - DigitalApplied. When competitors agree to govern an integration standard together, that standard stops being a bet and becomes a foundation you can build on for years. The practical consequence for an app builder is that the tool you wire into Claude through MCP is portable: it survives a model switch, a client switch, and a vendor's strategy change.
The adoption pattern among serious companies shows how MCP gets used in practice, and it is instructive because it is not theoretical. Stripe runs a remote MCP server at mcp.stripe.com exposing around 25 tools behind OAuth, usable from Cursor, VS Code, Claude Code, and other clients, while Block built more than 60 internal MCP servers covering Git, Snowflake, Jira, and Google Workspace to power its Goose agent - SSNTPL. The lesson is that MCP scales from "expose your public API to agents" to "wire your entire internal toolchain into one agent surface," and both ends are real production deployments rather than demos.
For builders, the strategic point is that MCP decouples your integrations from your model choice, which is the single most valuable property in a market where the best model changes every few months. If your tools live behind MCP, swapping Fable 5 for a cheaper model on a sub-task, or running the same agent on a different client, costs you nothing in integration work. That is why the LLM tool gateway pattern has become a standard architectural layer, and our tool gateways guide covers it in depth. If you want to build one of these from scratch, our build your first MCP server guide is the hands-on path, and the original MCP launch explainer covers the protocol's origins.
9. Claude on the clouds, and where it does not reach
The sixth surface is not a different way to build so much as a different place to run, and it matters disproportionately for anyone inside a regulated enterprise. Claude is the only frontier model available across all three major clouds, on Amazon Bedrock, Google Vertex AI, and Microsoft Foundry, with more than 100,000 customers on Amazon Bedrock alone - innFactory. Fable 5 reached Bedrock at launch - Amazon. For a company whose data governance, billing, and identity all live in one cloud, this is the surface that makes Claude usable at all, because procurement and security review a Bedrock deployment in a way they will never review a direct API key.
The structural reason this surface exists is that the model and the distribution are separable, and Anthropic chose to be everywhere rather than to lock builders into one channel. There are actually two distinct flavors worth not confusing. The first-party API and a newer Anthropic-operated offering called Claude Platform on AWS give you same-day feature parity, including Managed Agents and server-side tools, with bare model IDs. Amazon Bedrock proper, by contrast, is partner-operated, uses provider-prefixed model IDs like anthropic.claude-fable-5, and importantly does not support Managed Agents or Anthropic's server-side tools. Google Vertex AI and Microsoft Foundry share that limitation.
That limitation is the single most important thing to internalize about the cloud surface, because it silently removes options you might have planned your architecture around. If your design depends on Managed Agents or on server-hosted code execution, Bedrock and Vertex cannot run it, and you fall back to the Claude API plus your own tool use on those clouds. The model is identical; the surrounding capabilities are not. Designing for a cloud deployment means designing within the subset of surfaces that cloud actually exposes, which usually means living at the API and workflow tiers and hosting your own agent loop.
The way to apply this in practice is to let your deployment constraint pick your surface, not the other way around. A startup with no compliance constraints should default to the first-party API for the full feature set. An enterprise with a hard requirement to keep everything inside AWS should plan around Bedrock's subset from day one and not discover the missing Managed Agents support in month three. The mistake is treating the cloud as a deployment detail to be decided late. It is a capability decision that should be made first, because it bounds everything else you can build.
10. The economics: what it actually costs to build
The first principle of Claude pricing is that you are buying tokens, and the entire cost structure follows from that one fact rather than from headline model prices. The current lineup is a clean ladder: Fable 5 at $10 input and $50 output per million tokens, Opus 4.8 at $5 and $25, Sonnet 4.6 at $3 and $15, and Haiku 4.5 at $1 and $5 - Anthropic docs. The structural insight that most cost analyses miss is that output tokens are five times the price of input tokens across the board, which means the single biggest cost lever is not which model you pick, it is how much the model writes. A verbose agent on a cheap model can easily cost more than a terse one on an expensive model.
| Model | Input /1M | Output /1M | Cache read /1M | Batch in/out /1M |
|---|---|---|---|---|
| Fable 5 | $10.00 | $50.00 | $1.00 | $5.00 / $25.00 |
| Opus 4.8 | $5.00 | $25.00 | $0.50 | $2.50 / $12.50 |
| Sonnet 4.6 | $3.00 | $15.00 | $0.30 | $1.50 / $7.50 |
| Haiku 4.5 | $1.00 | $5.00 | $0.10 | $0.50 / $2.50 |
The second principle is that prompt caching is not an optimization, it is the difference between viable and not. Cache reads cost 0.1 times the base input rate, while a five-minute cache write costs 1.25 times and a one-hour write costs 2 times - Anthropic docs. For any app that re-sends a stable prefix (a long system prompt, a tool list, a document under analysis), caching that prefix turns a recurring full-price charge into a recurring near-free one. This is the mechanism behind the Claude Code session that costs $25 cached versus $80 uncached. The catch is that caching is a strict prefix match, so a single changing byte (a timestamp in the system prompt, an unsorted JSON dump, a varying tool set) silently invalidates everything after it and you pay full price without realizing it.
The third principle is that effort governs spend without changing the rate. The effort levels (low, medium, high, xhigh, max) do not alter the per-token price, they alter how many tokens a request consumes across text, tool calls, and adaptive thinking. Third-party analysis estimates roughly a 2.7-times spread in output tokens between low and max on the same task - MindStudio. A worked example from Anthropic grounds the whole picture: handling 10,000 customer-support conversations on Haiku 4.5 at around 3,700 tokens each costs roughly $37 total. The same workload on Fable 5 would cost ten times more for capability you almost certainly do not need on routine support. The discipline is to route each task to the cheapest model that clears the quality bar, which is exactly the multi-model architecture our model benchmarks and pricing guide lays out.
The fourth principle is the long arc, and it is the one that should shape your architecture decisions more than any single price. Inference prices are falling fast, with Epoch AI documenting a median 50-times-per-year decline, and as much as 200 times per year since January 2024 for some capability levels - Epoch AI. The structural implication is that today's premium model is next year's commodity, so betting your architecture on a specific model's price is a mistake. Bet instead on a design where models are swappable behind a tool layer, route aggressively to the cheapest model that works, and treat caching and effort as first-class budget controls. The teams that win on cost are not the ones that found the cheapest model, they are the ones whose architecture lets them keep switching to it.
11. Production patterns that separate apps from demos
The single highest-leverage decision in any Claude app is the design of the tool surface, and it is where amateurs and professionals diverge most visibly. The amateur instinct is to expose every internal function as a tool, producing a sprawling menu that confuses the model and wastes context. The professional pattern is few, high-impact, consolidated tools that return semantic, human-readable context rather than raw identifiers. Anthropic's own engineering team found that resolving UUIDs to readable names in tool responses significantly improves precision by reducing hallucinations, and that a set of Claude-optimized Slack tools reached around 90% accuracy versus roughly 75% for the human-written versions - Anthropic Engineering. You write tool descriptions like documentation for a new hire, you namespace related tools, and you cap response sizes (Claude Code defaults to a 25,000-token cap) so a single tool call cannot drown the context.
The second production pattern is caching discipline, which follows directly from the economics in section 10 but is an engineering practice, not a billing setting. Because caching is a prefix match, the architecture rule is to keep stable content first and volatile content last. The system prompt must be frozen, with no interpolated dates or session IDs at the top; tools must be serialized deterministically; and dynamic context goes into the messages at the end, not into the prefix. The way you verify it is working is to read cache_read_input_tokens on the response, and if it is zero across repeated identical-prefix requests, a silent invalidator is at work. The common culprits are a datetime.now() in the system prompt, a json.dumps() without sorted keys, or a tool set that varies per user.
The third pattern is structured everything, because the boundary between a model and a downstream system is where reliability is won or lost. Constrain outputs with output_config.format when you need parseable data, use strict tools when you need guaranteed parameters, and always parse tool inputs with a real JSON parser rather than string-matching the serialized form, since Fable 5 and the 4.x models can escape Unicode and slashes differently than older models. The fourth pattern is context management for anything long-running: server-side compaction summarizes earlier history when you approach the context limit, but you must append the full response content (including compaction blocks) back to your messages, or you silently lose the compaction state and the conversation corrupts.
The fifth pattern, and the one most teams skip entirely, is evaluation as a build discipline rather than an afterthought. Anthropic's own tool-optimization result (the jump from roughly 75% to 90% accuracy on Slack tools) came from an eval loop, not from intuition: they measured how well Claude used each tool, rewrote the tools that scored badly, and measured again. The same loop applies to your whole app. Without a small, representative eval set you are tuning prompts and effort levels blind, and you will mistake a lucky demo for a working system. The cheapest version is a handful of real inputs with known-good outputs that you run on every change, and it is the difference between knowing your app works and hoping it does. This is doubly true for model routing, where an eval set is the only honest way to decide whether a sub-task can drop from Fable 5 to a cheaper model without quality loss.
The reason these patterns matter is that they are invisible until they are catastrophic. A sloppy tool surface looks fine in a demo and degrades quietly as the agent's job grows. A broken cache works perfectly and just costs ten times too much. Lost compaction state runs fine until the conversation gets long enough to break. None of these show up as errors; they show up as a product that is mysteriously expensive, slow, or unreliable in production while passing every test. The discipline of building for the failure mode you cannot see is what separates a Claude app that ships from one that demos, and it is the throughline of our insider guide to building AI agents.
12. How it fails: limits, refusals, and prompt injection
The most distinctive failure mode of Fable 5 is one you have to design around from day one: the model can refuse, and the refusal is a success-shaped response. Because Fable 5 is the safety-classified twin of a more dangerous internal model, it ships with conservative classifiers that decline requests touching cyber, bio-chemical, or model-distillation topics. When a classifier triggers, the API returns stop_reason: "refusal" with an HTTP 200, not an error, and the documented mitigation is an opt-in fallback to Opus 4.8 - Anthropic docs. Crucially, Anthropic tuned these safeguards conservatively for a fast and safe launch, so they sometimes catch harmless requests, triggering on average in under 5% of sessions - Anthropic. Early reporting documented exactly this overcaution on innocuous prompts - The Register. If your app does not handle a 200-with-refusal, it will look broken to a meaningful slice of users. This safety lineage traces back to Anthropic's defensive-cyber program, which our Project Glasswing guide covers in full.
The second failure mode is the one that should keep security teams up at night: prompt injection through tools and skills. As agents gain the ability to read external content and execute actions, that external content becomes an attack surface. A Snyk study of the agent-skills supply chain found prompt injection in 36% of the skills it examined, with more than 1,400 malicious payloads - Snyk. Worse, the way models fail here is subtle. Trajectory-based safety research documented a pattern where the model's own reasoning identifies the attack and then proceeds anyway, a "rationalized abdication" that appeared in the large majority of unsafe traces studied - arXiv. The model knowing something is an attack does not mean it refuses to act on it.
The third failure cluster is operational rather than adversarial, and it is where most real money gets wasted. Three patterns dominate:
- Overthinking at high effort - an agent set to
maxon a task that neededmediumburns tokens exploring options no one asked for. - Cost blowups from broken caching - the silent prefix invalidation from section 11 quietly multiplies the bill.
- Over-broad permission scope - a coding agent with unrestricted filesystem or shell access can do real damage when it misreads a task.
These are not exotic; they are the default outcomes of not designing against them. The pattern across all three is that the failure is silent and the cost is real, which is the same shape as the production pitfalls in section 11.
The way to design for all of this is to treat the agent as a powerful but fallible employee, not as a deterministic function. You gate destructive actions behind confirmation, you scope filesystem and tool permissions to the minimum the task needs, you validate and sanitize anything that comes back from a tool before acting on it, and you handle refusals as a normal branch rather than an exception. The teams that get burned are the ones who assumed the model would behave like an API. The teams that ship are the ones who assumed it would occasionally behave like a confused, overconfident, occasionally-manipulated human, and built the guardrails accordingly.
13. The competitive landscape in mid-2026
Building on Claude is a choice, and an honest guide has to put the alternatives on the table from first principles rather than from loyalty. The structural question is not "which model is best," it is "which model delivers the outcome you need at the cost you can afford," and the answer is genuinely different for different apps. As of June 2026 the frontier competitors a builder actually weighs are OpenAI's GPT-5.5, Google's Gemini 3.1 Pro, and the open-weight DeepSeek V4, with Meta having pivoted from open Llama to a proprietary model called Muse Spark.
GPT-5.5, released April 24, 2026, is the most direct competitor for agent building because OpenAI positioned it explicitly as an agent runtime rather than a chat model, trained end-to-end for multi-step tool use - MindStudio. It prices at $5 input and $30 output per million tokens with a context window above a million, which makes it cheaper per token than Fable 5 while scoring lower on the hardest coding benchmark. Gemini 3.1 Pro, in preview since February 2026, is the cost play at $2 input and $12 output with an industry-leading 2-million-token context window - DevTk. DeepSeek V4-Pro is the open-weight price disruptor under an MIT license, at roughly $0.44 input and $0.87 output under promotional rates with a 1-million-token context - Codersera.
The benchmark comparison demands a critical caveat that catches even careful builders, and getting it wrong makes any analysis worthless. There are two SWE-Bench numbers in circulation and they are different benchmarks. On the harder, vendor-reported SWE-Bench Pro, Fable 5 leads at 80.3% over GPT-5.5's 58.6% and Gemini's 54.2%. But many pages quote SWE-Bench Verified instead, an easier benchmark now considered partly contaminated, where GPT-5.5 scores around 88.7% and Gemini around 80.6% - Morph LLM. Those higher numbers are not contradictions and they are not evidence that GPT-5.5 beats Fable 5; they are a different test. Any comparison that mixes a Verified score for one model with a Pro score for another is simply wrong, and a surprising amount of published analysis does exactly that.
The first-principles takeaway is that the right model is a routing decision, not a loyalty decision. Fable 5 earns its premium on the hardest, highest-stakes work where being correct matters more than being cheap, which is precisely the migration-and-refactor territory the Stripe story illustrates. Gemini's enormous context and low price make it compelling for cheap, high-volume, long-document work. DeepSeek's open weights and rock-bottom price make it the default for cost-sensitive workloads where you control the deployment. The mature architecture routes each task to the cheapest model that clears its bar, which is why MCP and the tool-gateway pattern from section 8 matter so much. For the model-by-model detail, see our guides to GPT-5.5, Gemini 3.1 Pro, DeepSeek V4, and Meta's Muse Spark.
14. The market, the money, and where this is going
Step back from the surfaces and the structural picture is a market in steep, well-funded growth, which matters to a builder because it tells you the foundation you are building on is not going to be abandoned. Analyst estimates converge tightly: the broad AI agents market sits around $7.84 billion in 2025 and is forecast to reach roughly $52.62 billion by 2030 at a 46% compound annual growth rate - MarketsandMarkets. Grand View Research lands in the same neighborhood at $50.31 billion by 2030 - Grand View Research. When multiple firms independently forecast a near-7-times expansion in five years, the signal is strong even if any single number is soft.
The adoption picture is wide but shallow, and that gap is the most honest thing in the whole market and the most important for a builder to internalize. Gartner projects that 40% of enterprise applications will feature task-specific agents by the end of 2026, up from under 5% in 2025, and that a third of enterprise software will embed agentic AI by 2028 - Gartner. But the same firm warns that over 40% of agentic AI projects will be canceled by the end of 2027 - Gartner. Surveys back the gap: only around 31% of enterprises have an agent in production and roughly 29% report significant returns, despite near-universal experimentation. Developers feel it too, with the 2025 Stack Overflow survey showing 84% using or planning to use AI tools while trust in their accuracy fell to 33% - Stack Overflow.
The capability trajectory underneath this is the part that resolves the tension, and it is best seen on the benchmark that defined the era. SWE-Bench Verified went from Claude 3.5 Sonnet's 33.4% in June 2024 to 49.0% by October 2024, and Fable 5 now posts around 95% on that same set - Anthropic. The benchmark is now considered partly contaminated precisely because it got solved, which is its own kind of milestone. The money tells the same story: Anthropic's run-rate revenue reportedly climbed from $9 billion in December 2025 to $47 billion by mid-May 2026, the backdrop to its $65 billion Series H at a $965 billion valuation - Simon Willison. The full financing picture sits in our Anthropic valuation guide.
The first-principles conclusion is that the adoption gap and the capability curve point the same direction, not opposite ones. Projects are being canceled not because the models cannot do the work, but because teams climb the wrong rung of the ladder, build agents where workflows would do, skip the production patterns, and get surprised by cost and refusals. The 40% cancellation rate is a discipline problem, not a capability problem. As the models keep improving and the surfaces keep maturing, the builders who win will be the ones who matched the surface to the task, designed for the silent failures, and kept their architecture model-agnostic. That is the whole argument of this guide compressed into one sentence, and it is why the market is real even though most projects fail.
15. Choosing your surface: a decision framework
The decision is simpler than the six surfaces make it look, because it collapses to a few honest questions about your task. The first is whether the job is one model call or many. If a user action maps to a single classification, extraction, or generation, you want the API, and you are done at section 3. If it maps to a known sequence of steps, you want a workflow, and you should resist the urge to call it an agent. Most apps that think they need an agent need a workflow, and the workflow is cheaper, faster, and debuggable in ways the agent is not.
The second question, for genuinely open-ended work, is who you want running the infrastructure. If running an agent loop and a sandbox is a core competency you want to own, the Agent SDK gives you the engine on your own compute. If the work is mostly code, Claude Code is purpose-built and battle-tested at enterprise scale. If you would rather rent the whole loop and the container, Managed Agents runs it for you at the cost of a session-hour fee. None of these is more advanced than the others; they are different answers to a question about ownership, and the right one depends on whether infrastructure is your differentiator or your distraction.
The third question cuts across all of them and is easy to defer until it bites: are you locked to a cloud, and are you integrating external tools? If your data and compliance live in AWS, your surface is bounded by Bedrock's subset before you choose anything else, so decide that first. If you are wiring Claude into existing systems, MCP is not optional, and building your tool layer behind it is what keeps your model choice free as prices fall. And at the very top of the ladder, if you would rather describe the outcome than build the integration at all, a describe-it platform like O-mega generates and runs the whole product from a conversation, the same Claude engine seen from the far end. Our guide to building software with AI maps that full spectrum.
The closing principle is the one that should outlive any specific model: build for the surface, design for the failure, and keep your architecture model-agnostic. Fable 5 is the most capable public model today, and in a year it will be a commodity while something above it takes the crown. The builders who compound are the ones whose tool layer, caching discipline, and surface choices survive that turnover. This guide was written by Yuma Heymans (@yumahey), founder of the autonomous-company platform O-mega and co-founder of the AI recruitment company HeroHunt.ai, who spends his days wiring frontier Claude models into production software and has opinions about which rung of the ladder most teams climb too fast. Match the surface to the task, design around the silent failures, and the model will mostly take care of itself.
This guide reflects the Claude and AI model landscape as of June 2026. Model IDs, pricing, benchmarks, and product availability in this space change frequently, so verify current details against the official sources before building.