Claude Opus 4.7: The Complete Guide (2026) | Articles

Yuma Heymans

17 April 2026

•

51 min read

The insider guide to Anthropic's newest flagship model: benchmarks, features, pricing, safety architecture, and what it means for the AI landscape.

Anthropic just released Claude Opus 4.7, and it scored 64.3% on SWE-bench Pro, a 20% jump over its predecessor. That single number tells you what this release is really about. While the AI industry debates alignment frameworks and existential risk scenarios, Anthropic quietly shipped the most capable commercially available coding model on the planet, then told everyone it is still holding back something far more powerful.

The release arrived on April 16, 2026, alongside a new AI design tool, a restructured safety framework, and the looming shadow of Claude Mythos, the unreleased model that Anthropic says can autonomously discover zero-day vulnerabilities. Opus 4.7 is the product Anthropic wants you to use. Mythos is the product Anthropic is afraid to let you use. Understanding the gap between these two models is essential to understanding where AI is heading in the second half of 2026.

This guide breaks down exactly what changed in Opus 4.7, how it compares to GPT-5.4 and Gemini 3.1 Pro on every major benchmark, the new features developers need to know (task budgets, xhigh effort, high-resolution vision), the Mythos connection, real-world developer reactions, and what all of this means for teams building with Claude today.

Written by Yuma Heymans (@yumahey), founder of o-mega.ai and researcher focused on AI agent architectures and autonomous systems infrastructure.

What Is Claude Opus 4.7
The Benchmark Breakdown
Computer Use and OSWorld
New Features and Capabilities
Vision: High-Resolution Image Support
Task Budgets and Agentic Control
The xhigh Effort Level
The Tokenizer Change Nobody Is Talking About
Breaking API Changes and Migration Pitfalls
Safety, Mythos, and the Cyber Verification Program
The AI Design Tool and Market Shockwaves
Enterprise Adoption: Who Is Already Using It
Developer Reactions and Early Adoption
Limitations, Criticisms, and Known Weaknesses
Pricing and Migration
How Opus 4.7 Compares to GPT-5.4 and Gemini 3.1 Pro
What This Means for AI Agent Platforms
Future Outlook

1. What Is Claude Opus 4.7

Claude Opus 4.7 is Anthropic's most capable generally available model, released on April 16, 2026. It is an incremental upgrade to Claude Opus 4.6, which launched in February 2026 with a 1 million token context window and 128k max output tokens. The model ID is claude-opus-4-7, and it is available across the Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and all Claude consumer products - Anthropic.

The word "incremental" is doing significant work in that description. In absolute terms, the improvements are substantial. A 20% improvement on SWE-bench Pro is not a minor version bump. But Anthropic itself frames this release modestly, acknowledging that Opus 4.7 is "less broadly capable than our most powerful model, Claude Mythos Preview." That framing is deliberate. Anthropic is managing two narratives simultaneously: one about shipping a great product, and another about responsible restraint.

To understand what makes Opus 4.7 meaningful, you need to look at the specific areas where it improves. This is not a model that is uniformly better at everything. It is a model that is dramatically better at coding, meaningfully better at vision, and subtly better at agentic workflows, while maintaining essentially the same performance as its competitors on general reasoning benchmarks where the entire frontier has converged. The improvements are concentrated precisely where they matter most for the developer and enterprise market that Anthropic is targeting.

The model supports all existing Claude features: adaptive thinking, tool use, computer use, and the full suite of API capabilities. There are some breaking API changes that developers need to be aware of (covered in section 9), and important tokenizer changes that affect cost calculations (section 8).

2. The Benchmark Breakdown

The benchmark story for Claude Opus 4.7 splits into two categories: coding benchmarks where it leads decisively, and general reasoning benchmarks where the top three frontier models are essentially tied.

Coding Benchmarks

The headline number is SWE-bench Pro, the benchmark that measures autonomous resolution of real software engineering issues from popular open-source Python repositories. Opus 4.7 scores 64.3%, up from 53.4% for Opus 4.6. For context, GPT-5.4 scores 57.7% and Gemini 3.1 Pro scores 54.2% on the same benchmark. That is not a narrow lead. Opus 4.7 is 6.6 percentage points ahead of its nearest competitor, which in the compressed space of frontier model performance is a significant gap - The Next Web.

On SWE-bench Verified (the curated subset), Opus 4.7 scores 87.6%, up from 80.8% for Opus 4.6 and ahead of Gemini 3.1 Pro's 80.6%. The Verified subset is smaller and more controlled, so the higher absolute numbers are expected, but the magnitude of improvement is consistent.

CursorBench, which measures autonomous coding performance inside the Cursor AI code editor, tells a similar story. Opus 4.7 hits 70%, up from 58% on Opus 4.6. This benchmark matters because Cursor is where a huge number of professional developers actually write code. Performance on CursorBench translates more directly to daily developer experience than abstract benchmarks.

On Anthropic's internal 93-task coding benchmark, Opus 4.7 lifted resolution by 13% over Opus 4.6, including solving four tasks that neither Opus 4.6 nor Sonnet 4.6 could solve at all. Anthropic also reports that Opus 4.7 shows a 14% improvement in multi-step agentic reasoning with a third of the tool errors compared to its predecessor - VentureBeat.

The tool error reduction deserves emphasis. In agentic workflows, where the model is calling tools, reading results, and deciding what to do next, a single tool error can cascade into a failed task. Reducing tool errors by 66% while improving overall task completion by 14% means the model is not just smarter but more reliable. Reliability is what separates a model you can supervise from a model you can trust to work autonomously.

General Reasoning Benchmarks

On GPQA Diamond (graduate-level reasoning), the frontier has converged to the point where differences are noise. Opus 4.7 scores 94.2%, GPT-5.4 Pro scores 94.4%, and Gemini 3.1 Pro scores 94.3%. These differences are within margin of error. No model has a meaningful advantage on pure reasoning anymore.

This convergence is itself significant. It means the competition between frontier models has shifted from "who reasons better" to "who executes better." Reasoning is table stakes. Execution, reliability, tool use, and real-world task completion are now the differentiators. That is exactly where Opus 4.7 concentrates its improvements.

On MCP-Atlas (multi-step tool use), Opus 4.7 scores 77.3% versus Gemini 3.1 Pro's 73.9%. On GDPval-AA (economically valuable knowledge work across finance and legal domains), Opus 4.7 also outperforms its predecessor, though exact competitor scores on this benchmark are less widely reported. The pattern is consistent: wherever the task requires executing multiple steps with tools rather than answering a single question, Opus 4.7 pulls ahead.

We previously explored how inference capabilities are reshaping entire software categories in our analysis of how LLM inference is eating software. Opus 4.7's benchmark profile fits that thesis perfectly. The model is not better at knowing things. It is better at doing things.

Finance and Domain-Specific Benchmarks

One benchmark that deserves more attention is the Finance Agent evaluation. Opus 4.7 scores 64.4%, up from 60.7% on Opus 4.6 and ahead of Gemini 3.1 Pro's 59.7%. This benchmark measures economically valuable knowledge work, including the ability to analyze financial data, draft Python scripts for scenario analysis, and navigate complex regulatory documents. For enterprises evaluating Claude for finance and legal workflows, this number matters more than SWE-bench.

Rakuten ran its own independent evaluation against production engineering tasks and found that Opus 4.7 resolved 3x more production tasks compared to Opus 4.6, with double-digit gains in both Code Quality and Test Quality scores. Independent validation from a major enterprise user carries more weight than self-reported benchmarks, and the Rakuten results confirm that the improvements translate to real-world codebases, not just benchmark suites - LLM Stats.

3. Computer Use and OSWorld

Computer use, where Claude autonomously controls a desktop environment by clicking, typing, and navigating GUI interfaces, receives a meaningful upgrade in Opus 4.7. The OSWorld-Verified benchmark, which tests autonomous interaction with real desktop software, climbed from 72.7% to 78.0%. That puts Opus 4.7 ahead of GPT-5.4's 75.0% and within 1.6 percentage points of Mythos Preview's 79.6% - Vellum AI.

The proximity to Mythos on computer use is notable. Anthropic held back Mythos primarily due to cybersecurity concerns, not computer use capabilities. This means Opus 4.7 gives you nearly Mythos-level computer use without the associated safety restrictions. For teams building desktop automation, robotic process automation replacements, or computer-use-based agents, this is the most capable publicly available model for the task.

The vision resolution upgrade from 1,568 pixels to 2,576 pixels directly feeds the computer use improvements. Screenshots of modern desktop applications are densely packed with small text, icons, and interactive elements. At the old resolution, Claude was effectively reading a blurred version of the screen. At the new resolution, it sees the interface at actual fidelity. This is why the visual navigation score (without tools) jumped from 57.7% to 79.5%: the model is simply seeing more of what is on screen.

Opus 4.7 is also the first Claude model to pass what Anthropic calls "implicit-need tests": tasks where the model must infer what tools or actions are required rather than being told explicitly. In previous models, if a task required opening a file browser but the instruction only said "find the report from last week," the model would sometimes get stuck because no explicit tool was named. Opus 4.7 infers the required action and executes it. This capability is foundational for real-world computer use, where users describe goals, not procedures - The Next Web.

The model also introduces multi-agent coordination for computer use: the ability to orchestrate parallel AI workstreams rather than processing tasks sequentially. For enterprise teams running Claude across code review, document analysis, and data processing simultaneously, this translates directly into throughput. Rather than queuing tasks and processing them one at a time, Opus 4.7 can manage concurrent workstreams with better role fidelity and coordination.

4. New Features and Capabilities

Opus 4.7 introduces several features that specifically target agentic and developer workflows. These are not cosmetic changes. They represent Anthropic's evolving understanding of how frontier models are actually used in production.

The most consequential new feature is task budgets, which we cover in depth in section 6. The second is the xhigh effort level, covered in section 7. But there are several other important capability improvements that deserve attention.

File-system-based memory is significantly improved. If you build agents that maintain scratchpads, notes files, or structured memory stores across conversation turns, Opus 4.7 is meaningfully better at writing to these files and leveraging them in subsequent tasks. This matters for any long-running agent architecture where context accumulates over time rather than being provided upfront. Previous Claude models would sometimes forget to update their scratchpad or fail to reference information they had previously stored. Opus 4.7 addresses this with better instruction following around persistent state.

Multi-step task performance shows a measurable improvement. Anthropic reports stronger performance on workflows that require calling multiple tools in sequence, handling unexpected results, and adapting plans when intermediate steps fail. In practical terms, when an agentic workflow hits a wall (wrong file path, unexpected API response, missing dependency), Opus 4.7 is better at self-correcting without spiraling into repetitive failure loops - DEV Community.

Progress updates during long agentic traces are another improvement. Opus 4.7 provides more regular status updates to the user throughout extended multi-step operations. This is a subtle but important change for developer experience. When a model is running a 15-minute autonomous coding session, the difference between silence and periodic progress updates is the difference between anxiety and confidence.

The /ultrareview command in Claude Code is new with this release and represents a genuinely novel approach to code review. Rather than performing a single-pass review, /ultrareview spawns multiple specialized agents: one focused on security, one on logic, one on performance, and one on style. Each agent reviews the code independently, and their findings are synthesized into a single report. This multi-agent approach catches issues that single-pass review consistently misses, particularly subtle logic errors and security patterns that require cross-file reasoning. For teams that have been building review workflows with Claude Code, this is a significant upgrade to the built-in tooling - FindSkill.ai.

For a deeper look at the Claude Code ecosystem and how these capabilities fit into the broader development workflow, we covered the full landscape in our Claude Desktop, Cowork and Code guide.

4. Vision: High-Resolution Image Support

Claude Opus 4.7 is the first Claude model with high-resolution image support, and the improvement is dramatic. Maximum image resolution increases to 2,576 pixels on the long edge / 3.75 megapixels, up from the previous limit of 1,568 pixels / 1.15 megapixels. That is more than three times the pixel capacity of prior Claude models - Claude API Docs.

The raw resolution numbers only tell part of the story. The qualitative improvements in what the model can actually perceive at these higher resolutions are substantial. On XBOW's visual-acuity benchmark, Opus 4.7 scores 98.5% compared to 54.5% for Opus 4.6. That is not an incremental improvement. It is a transformation.

Beyond raw resolution, Opus 4.7 improves on low-level perception tasks: pointing, measuring, counting, and similar spatial reasoning. It also delivers improved natural-image bounding-box localization and detection, making it better at identifying and locating specific elements within images. The model can now read chemical structures, complex technical diagrams, and dense charts that previous Claude models struggled with.

These vision improvements have direct implications for three major use cases. First, computer use: when Claude is controlling a computer interface, higher resolution means it can read smaller text, identify more UI elements, and navigate more complex interfaces. Second, document analysis: financial statements, engineering drawings, medical images, and legal documents are all higher-resolution workloads where Opus 4.6 was limited. Third, screenshot understanding: developer tools, dashboards, and monitoring interfaces are all densely packed with information that requires high-resolution perception.

There is a cost trade-off. High-resolution images on Opus 4.7 can use up to approximately 3x more image tokens than on prior models (4,784 vs 1,600 tokens per image). High-resolution support is automatic: there is no beta header or client-side opt-in required. If you are processing large volumes of images, you will want to account for this increase in your cost modeling.

To put the token increase in perspective: if your agent processes 100 screenshots per session (common in browser automation), the vision token cost per session rises from approximately 160,000 tokens to 478,400 tokens. At $5 per million input tokens, that is the difference between $0.80 and $2.39 per session in vision tokens alone. For high-volume browser automation platforms, this is a material cost increase that needs to be modeled alongside the quality improvements.

The vision upgrade also has implications for the document processing market specifically. Financial statements, legal contracts, and engineering blueprints are routinely scanned at high resolution. At the old 1.15 megapixel limit, dense text in these documents was often misread or missed entirely. At 3.75 megapixels, the model sees the document as a human would. For enterprises that process thousands of documents daily (insurance claims, mortgage applications, regulatory filings), the accuracy improvement can reduce manual review costs significantly. The question is whether the vision quality improvement justifies the 3x token cost increase for each specific document processing workflow.

The vision improvements are also directly relevant for the emerging AI design tool that Anthropic announced alongside Opus 4.7, which we cover in section 11.

6. Task Budgets and Agentic Control

Task budgets are the feature that matters most for production agentic systems. Anthropic is introducing them in public beta with Opus 4.7, and they address one of the biggest practical problems with autonomous AI agents: unpredictable cost.

A task budget gives Claude a rough estimate of how many tokens to target for a full agentic loop, including thinking, tool calls, tool results, and final output. The model sees a running countdown and uses it to prioritize work and finish the task gracefully as the budget is consumed. This is not a hard cutoff that abruptly terminates the model. It is a constraint that the model integrates into its planning - Claude API Docs.

The practical impact is significant. Before task budgets, developers had two options for controlling agentic cost: set a hard token limit (which would cut the model off mid-task) or let the model run without constraints (which could result in unexpected bills). Neither option was good. A hard cutoff produces incomplete work. No limit produces unpredictable cost.

Task budgets create a third option: the model plans its work within a budget. If it has 50,000 tokens budgeted, it will prioritize the highest-value steps, skip lower-priority exploration, and wrap up cleanly as the budget runs low. This is closer to how a human contractor works with a fixed budget: you do the most important work first and save enough time to deliver a complete result.

For teams building autonomous agent systems, this feature has immediate practical value. You can set task budgets based on the complexity of the request, the value of the task, or the customer's tier. A debugging session gets a smaller budget than a full feature implementation. A free-tier user gets a tighter budget than an enterprise customer. The model handles the prioritization within whatever budget you set.

This aligns with the broader trend of AI agents becoming production-grade infrastructure rather than experimental toys. As we explored in our guide to the Anthropic ecosystem, Anthropic has been systematically building the features that enterprise teams need to deploy Claude agents at scale. Task budgets are perhaps the most directly business-relevant addition in this release.

7. The xhigh Effort Level

Previous Claude models supported four effort levels: low, medium, high, and max. Opus 4.7 adds a fifth: xhigh (extra high), which sits between high and max. This gives developers finer control over the trade-off between reasoning depth and token expenditure - Claude API Docs.

The xhigh level is specifically designed for coding and agentic use cases that require extended exploration, such as repeated tool calling, detailed search across large codebases, or complex multi-step debugging. Anthropic's internal data shows that while max effort yields the highest benchmark scores (approaching 75% on coding tasks), xhigh provides a "compelling sweet spot" between performance and cost.

The economics here matter. Max effort can consume significantly more tokens than xhigh because the model explores more reasoning branches, considers more alternatives, and verifies its work more thoroughly. For many production workloads, the difference in quality between xhigh and max does not justify the difference in cost. Having a named effort level between high and max lets developers find the right balance for their specific use case without having to experiment with custom prompting to achieve intermediate reasoning depth.

Anthropic's recommendation is to start with xhigh for advanced coding and complex agentic work, and reserve max for the hardest problems where you need every last percentage point of accuracy. For most intelligence-sensitive use cases, they recommend a minimum of high effort. Low and medium are appropriate for simple classification, extraction, and other tasks that do not require deep reasoning.

This effort level system reflects a maturation in how frontier model providers think about their products. Rather than offering a single model that does everything at maximum capability (and maximum cost), Anthropic is building a graduated system where developers can match intelligence level to task difficulty. This is analogous to how cloud computing evolved from "one instance size" to a spectrum of options optimized for different workloads.

The practical implementation looks like this: a routing layer in front of your Claude API calls classifies incoming requests by difficulty. Simple customer support queries get medium effort. Standard coding tasks get high. Complex multi-file refactoring gets xhigh. Critical production debugging gets max. Each tier costs proportionally more but delivers proportionally better results. The key insight is that most production workloads are a mix of easy and hard tasks, and the old model (everything at max) was wasteful for the easy ones while being necessary for the hard ones.

For agent platforms that process thousands of requests per day, the effort system is not just a cost optimization. It is an architecture pattern. By matching effort to task difficulty, you can offer more competitive pricing on simple tasks while still delivering frontier-quality results on complex ones. This is the kind of fine-grained control that separates production-grade AI infrastructure from prototype-stage experimentation. For developers building on Claude who want to understand the full effort system, the Claude API documentation on effort provides implementation details and benchmarks at each level.

8. The Tokenizer Change Nobody Is Talking About

Here is the detail that most coverage of Opus 4.7 misses, and it directly affects your bill: Opus 4.7 uses a new tokenizer that may produce up to 35% more tokens for the same text compared to Opus 4.6 - Finout.

The per-token price is unchanged at $5 per million input tokens and $25 per million output tokens. But if the same input text maps to 1.0x to 1.35x more tokens depending on content type, your effective cost per request goes up even though the rate card did not. This is the difference between nominal pricing and effective pricing.

To understand why this matters, consider a typical API workflow. You send a prompt with a system message, conversation history, and a user query. Under the Opus 4.6 tokenizer, that prompt might be 10,000 tokens. Under the Opus 4.7 tokenizer, the exact same text could be 12,500 to 13,500 tokens. At $5 per million input tokens, that is the difference between $0.050 and $0.0625-$0.0675 per request. For individual requests, the difference is negligible. For production workloads processing millions of requests, the cumulative impact is significant.

Why did Anthropic change the tokenizer? Updated tokenizers typically improve how the model processes text by creating more semantically meaningful token boundaries. A better tokenizer can improve model quality even without changes to the underlying model weights. The trade-off is more tokens per input, but potentially better understanding of each token. Whether this trade-off is favorable depends on your specific workload and margin structure.

The practical advice is straightforward: re-benchmark your cost estimates before upgrading from Opus 4.6. Run a sample of your typical inputs through both tokenizers and measure the difference. If you are operating at tight margins, the tokenizer change could turn a profitable workflow unprofitable. If you have headroom, the quality improvements may more than justify the cost increase.

9. Breaking API Changes and Migration Pitfalls

While the general migration story from Opus 4.6 to 4.7 is smooth, there are several breaking changes that developers need to know about before switching production traffic. These are not minor deprecations. They will cause hard failures if your code relies on the old behavior.

Extended thinking budgets are removed. In Opus 4.6, you could set thinking.budget_tokens to control extended thinking behavior. In Opus 4.7, setting thinking parameters with specific budget tokens will return a 400 error. This is a hard break. If your API integration passes budget_tokens, your requests will fail immediately after upgrading. The replacement is the effort parameter system (low, medium, high, xhigh, max), which gives you control over reasoning depth without specifying exact token budgets for thinking - The New Stack.

Non-default sampling parameters are blocked. Setting temperature, top_p, or top_k to any non-default value will also return a 400 error on Opus 4.7. This is a significant change for teams that tuned these parameters to control output randomness. Anthropic's reasoning is that the effort parameter system provides a better control mechanism for reasoning depth, and that temperature/top_p tuning often produced unpredictable results with thinking-capable models. If your application relies on temperature=0 for deterministic output, you will need to restructure your approach.

Prompting compatibility issues are more subtle but equally important. Anthropic's own announcement notes that Opus 4.7's more literal instruction-following means prompts written for earlier models can produce unexpected results. Where Opus 4.6 would interpret vague instructions charitably, Opus 4.7 follows them precisely. If your prompt says "briefly summarize" but your system actually needs a detailed analysis, Opus 4.7 will give you the brief summary you literally asked for. This is technically correct behavior, but it can break workflows that relied on the model's previous tendency to exceed the stated scope.

The practical migration path is: audit your API calls for budget_tokens, temperature, top_p, and top_k parameters. Remove them or replace them with effort-level settings. Then run your prompt suite against Opus 4.7 in a staging environment and compare outputs to Opus 4.6 to catch any literal-instruction-following regressions. This is not a "swap the model ID and ship" upgrade for most production systems.

The literal instruction-following change is worth examining through a specific example. In Opus 4.6, if you prompted the model with "Write a brief summary of this document" and the document contained critical information, the model would often produce a detailed analysis despite the "brief" instruction, because it inferred that comprehensiveness was more valuable than brevity. Opus 4.7 will produce the brief summary you asked for. If you actually wanted a detailed analysis, you need to ask for a detailed analysis. This is objectively better behavior (the model does what you say), but it can break workflows that relied on the model over-delivering relative to the stated instruction.

For teams managing the migration, a useful pattern is to run both Opus 4.6 and 4.7 in parallel on the same inputs for a week, comparing outputs. Any case where 4.7 produces a noticeably different (especially shorter or more literal) response is a prompt that needs updating. This parallel evaluation period costs more but prevents production surprises. For teams already using Claude Code, the Inside Claude Code analysis we published earlier provides useful context for understanding how prompts flow through the system.

10. Safety, Mythos, and the Cyber Verification Program

The most interesting part of the Opus 4.7 release is not what Anthropic shipped. It is what they did not ship, and why.

Claude Mythos Preview is Anthropic's most capable model. It is more powerful than Opus 4.7 across every dimension. And Anthropic is not making it generally available because they believe it can autonomously discover and exploit zero-day software vulnerabilities at a scale that exceeds both human researchers and every automated tool in existence - CNBC.

That claim is extraordinary. Zero-day discovery has traditionally been the domain of elite security researchers and nation-state actors. A model that can do this autonomously does not just create cybersecurity risk. It changes the fundamental economics of vulnerability research. We covered the full implications of this in our Claude Mythos Preview guide.

Anthropic's strategy is to use Opus 4.7 as a testing ground for safety mechanisms that could eventually enable Mythos-class models to be deployed publicly. During Opus 4.7's training, Anthropic experimented with differential reduction of cybersecurity capabilities. The idea is to maintain the model's general intelligence while specifically weakening its ability to discover and exploit vulnerabilities. Opus 4.7 is released with automated safeguards that detect and block requests indicating prohibited or high-risk cybersecurity uses - Help Net Security.

This approach is part of Project Glasswing, Anthropic's broader initiative to understand the risks and benefits of AI models for cybersecurity. The project's thesis is that the same capabilities that make a model dangerous for offensive security also make it valuable for defensive security. The challenge is building safeguards that preserve the defensive value while blocking the offensive applications.

For security professionals who need Opus 4.7 for legitimate purposes (vulnerability research, penetration testing, red-teaming), Anthropic has created the Cyber Verification Program. This is a vetting process where verified security professionals get access to cybersecurity capabilities that are blocked for general users. What Anthropic learns from real-world deployment of these safeguards will directly inform whether and how Mythos-class models eventually become available.

We covered the full scope of Project Glasswing in our dedicated guide, which analyzes the strategic implications for both the AI industry and the cybersecurity ecosystem.

Safety Profile

Overall, Opus 4.7 shows a similar safety profile to Opus 4.6, with low rates of concerning behavior such as deception, sycophancy, and cooperation with misuse. On some measures, including honesty and resistance to malicious prompt injection attacks, Opus 4.7 is an improvement over its predecessor - Axios.

The decision to hold back Mythos while shipping Opus 4.7 tells you something important about where Anthropic thinks the risk boundary lies. They are comfortable releasing a model that is dramatically better at coding and agentic work. They are not comfortable releasing a model that can autonomously find zero-days. The line is drawn at capabilities that could cause harm at scale without human direction.

11. The AI Design Tool and Market Shockwaves

Alongside Opus 4.7, Anthropic announced a new AI-powered design tool that lets users build websites, presentations, and landing pages using natural language prompts. The tool targets both technical and non-technical users, positioning Anthropic directly against Figma, Adobe, Wix, and emerging startups like Gamma - The Information.

The design tool represents a strategic shift for Anthropic. Until now, the company has positioned itself as a model provider and developer platform. Building an end-user design tool signals a move toward becoming a full-stack AI studio that ships complete products, not just APIs. The internal signals point toward Anthropic repositioning from a language model provider toward a platform where Claude builds and deploys complete products - Decrypt.

The market reacted immediately and violently. When the design tool was leaked on April 15 (the day before the official Opus 4.7 launch), shares of Adobe, Figma, and Wix each dropped over 2% in a single trading session. GoDaddy fell 4.4% in the same session. Prediction markets surged the April 16 launch contract to certainty within minutes of the leak, with all alternative date contracts crashing to 0% - GuruFocus.

This was not the first time an Anthropic model release moved public markets. Previous Claude releases have contributed to selloffs in legal and financial analysis software stocks. The pattern is consistent: each new model generation that expands what AI can do autonomously triggers a repricing of the companies whose products deliver that value manually. The Nasdaq experienced its worst two-day tumble in months after a sequence of frontier model announcements earlier this year.

The design tool stock reaction reveals something important about how the market prices AI capability improvements. Investors do not differentiate between "Anthropic released a better coding model" and "Anthropic released a tool that competes with our portfolio company." A model improvement is abstract. A competing product is concrete. The design tool made Anthropic's capabilities tangible in a way that benchmark numbers never could.

For context on how AI-powered website creation is already reshaping the landscape, we covered the full competitive field in our best AI website makers 2026 guide. The addition of Anthropic's tool adds a new dimension to that market: a design tool backed by the same model that leads on coding benchmarks, meaning it can not only design but also implement what it designs.

This has implications for the broader ecosystem of AI agent platforms. If Anthropic starts shipping end-user products that compete with the tools its API customers build, it creates a tension between Anthropic as partner and Anthropic as competitor. This is the same dynamic that played out with AWS launching products that compete with its customers, and with OpenAI building ChatGPT plugins that overlap with third-party integrations. How Anthropic manages this tension will matter for any company building on the Claude API.

12. Enterprise Adoption: Who Is Already Using It

Enterprise adoption of Opus 4.7 is moving faster than typical model releases, partly because the model is a drop-in replacement for Opus 4.6 and partly because several large organizations were in Anthropic's early-access program.

TELUS, one of the world's largest telecom and healthcare service providers, adopted Claude as the core engine for its internal Fuel iX platform. The platform gives 57,000 employees direct access to advanced AI workflows. Developers at TELUS leverage Claude Code directly within VS Code and GitHub for real-time refactoring, making them one of the largest known Claude Code deployments in a single enterprise - Data Studios.

Bridgewater Associates, the world's largest hedge fund, uses Claude Opus to power an Investment Analyst Assistant deployed through Amazon Bedrock. The system drafts Python scripts, runs scenario analysis, and visualizes financial projections. It is designed to replicate junior analyst workflows supporting Bridgewater's AIA Labs research team. The system reduces time-to-insight by 50-70% on complex equity, FX, and fixed-income reports. With Opus 4.7's improved coding and finance agent benchmark scores, Bridgewater's use case gets a direct capability boost.

Box ran detailed internal evaluations comparing Opus 4.7 against its predecessor. Their results paint a compelling efficiency picture: 56% reduction in model calls, 50% reduction in tool calls, responses that were 24% faster, and 30% fewer AI Units consumed. These numbers suggest that Opus 4.7's improved reliability (fewer retries, fewer tool errors) translates directly into infrastructure cost savings that partially offset the tokenizer cost increase - 9to5Mac.

The Box data is particularly instructive because it measures what matters in production: not benchmark scores but operational efficiency. A 56% reduction in model calls means the model is solving problems in fewer attempts. A 50% reduction in tool calls means it is using tools more efficiently rather than trial-and-error probing. These are the metrics that determine whether a model upgrade pays for itself.

The enterprise deployment story is strengthened by the availability across all major cloud providers. Running Opus 4.7 through Microsoft Foundry means every inference request inherits Foundry's enterprise controls: Azure Active Directory for identity, private networking via VNet, and logging through Azure Monitor, without needing to build a separate governance layer. Amazon Bedrock and Google Cloud Vertex AI offer similar enterprise wrappers. Snowflake Cortex AI also announced same-day availability, allowing enterprises to run Opus 4.7 directly within their data warehouse environment - AWS.

For teams already using Claude in production, the upgrade path is transparent: Opus 4.7 is a drop-in replacement across all deployment channels, with the API parameter caveats covered in section 9. The combination of improved quality, reduced operational cost (per Box's data), and zero-downtime migration makes this one of the smoother frontier model upgrades in recent memory.

13. Developer Reactions and Early Adoption

The early developer consensus on Opus 4.7 is cautiously positive. The coding improvements are real and immediately noticeable. The same-price upgrade path makes adoption frictionless. But the tokenizer change introduces cost uncertainty that tempers the enthusiasm.

Cursor announced Opus 4.7 support within minutes of launch and offered 50% off to drive adoption. When the leading AI code editor prioritizes integration and subsidizes usage, that tells you the editor's team has already tested the model internally and believes it delivers meaningful improvements for their users.

Replit reported achieving the same quality at lower cost with Opus 4.7. This seems contradictory given the tokenizer change, but likely reflects that Opus 4.7's improved task completion rate means fewer retries and less wasted compute on failed attempts. If your baseline includes the cost of failure, a more reliable model can be cheaper even if each individual request is slightly more expensive - FindSkill.ai.

From real-world testing, developers report that Opus 4.7 produces noticeably cleaner code, with fewer wrapper functions and less over-engineering. One consistent theme across developer reactions is that the model is better at matching the style and conventions of existing codebases rather than imposing its own patterns. This is a subtle but important improvement for professional developers who need generated code to fit into established architectures.

The self-correction improvements are also consistently noted. When agentic workflows hit a wall (wrong file path, unexpected API response, missing dependency), Opus 4.7 is better at diagnosing the actual problem and trying a different approach rather than repeating the same failed operation. This is the kind of improvement that does not show up clearly in benchmarks but dramatically affects daily developer experience.

On Hacker News, the discussion thread for Opus 4.7 generated significant engagement, with the dominant themes being the Mythos revelation, the tokenizer cost implications, and practical coding comparisons against GPT-5.4 - Hacker News. The developer community is split on whether the Mythos narrative is genuine safety concern or marketing positioning. Both interpretations may be partially correct.

Boris Cherny, a notable developer in the Claude Code community, shared his early impressions: Opus 4.7 "feels more intelligent, agentic, and precise than 4.6" but noted it took several days to learn how to work with it effectively and fully take advantage of its new capabilities. This is an important observation. The model's more literal instruction-following and expanded capabilities mean that workflows optimized for Opus 4.6 are not automatically optimal for Opus 4.7. There is a learning curve, and developers who put in the effort to understand the model's new strengths will extract significantly more value than those who treat it as a drop-in replacement.

Anthropic itself published a best practices guide for using Opus 4.7 with Claude Code, recommending that developers provide detailed plans rather than vague instructions, use the effort parameter to match task difficulty, and leverage the model's improved file-system memory for maintaining context across sessions - Claude Blog. The existence of a dedicated best-practices document alongside a model launch is unusual and signals that Anthropic recognizes the model behaves differently enough from its predecessor to warrant explicit guidance.

14. Limitations, Criticisms, and Known Weaknesses

No model release deserves uncritical coverage. Opus 4.7 has real limitations that matter for production deployment decisions, and the developer community has identified several issues in the first 24 hours.

The Mythos Gap Is Real

Anthropic's own positioning makes this clear: Opus 4.7 is not their best model. Mythos Preview outperforms it across every dimension, including computer use (79.6% vs 78.0% on OSWorld). For teams evaluating whether to build critical infrastructure on Opus 4.7, the knowledge that a substantially better model exists but is deliberately withheld introduces uncertainty. Will Mythos be released in six months? Will a Mythos-derived model replace Opus 4.7 before your integration is complete? These are strategic questions without clear answers - Axios.

Reduced Cybersecurity Capabilities

Anthropic deliberately reduced Opus 4.7's cyber capabilities during training. For the vast majority of users this is irrelevant. But for security teams doing legitimate vulnerability research, penetration testing, or red-teaming, the model may be less useful than competitors that have not applied similar restrictions. The Cyber Verification Program exists as a workaround, but adding a verification step to access standard security capabilities creates friction.

Safety Downgrade on Harm Reduction

Opus 4.7 has one notable safety regression: Anthropic acknowledges it is "modestly weaker" regarding the tendency to give overly detailed harm-reduction advice on controlled substances. This is a narrow issue, but it illustrates that model improvements in one dimension can come with regressions in others. Safety is not a monotonically improving curve.

Exploration Loops and Memory Loss

User-reported issues include the model being more prone to exploration loops in certain agentic scenarios, where it cycles through the same set of actions without making progress. Some users also report memory loss during long sessions, where the model fails to reference information from earlier in the conversation despite having it in context. These reports are anecdotal and may reflect specific prompt patterns rather than systematic issues, but they counter the narrative of universal improvement.

Several developers describe the Claude Code desktop app as feeling "unpolished" alongside this release, with some noting that the model's more literal instruction-following (a deliberate improvement for precision) can feel rigid when exploratory prompting is needed. The model does exactly what you ask, which is great when your instructions are precise and frustrating when they are intentionally vague - The New Stack.

Multilingual Performance

Opus 4.7 shows only incremental improvement over Opus 4.6 in multilingual performance, an area where Gemini 3.1 Pro continues to lead. If your primary use case involves non-English languages (particularly CJK languages, Arabic, or Indic languages), the choice between Opus 4.7 and Gemini 3.1 Pro may favor Gemini, especially given Gemini's pricing advantage.

The Effective Price Increase

We covered this in section 8, but it bears repeating in a limitations section: the tokenizer change means that despite "unchanged" pricing, your effective cost per request increases by up to 35%. Additionally, high-resolution images consume up to 3x more tokens (4,784 vs 1,600 per image). For teams processing large volumes of images or text, the combined effect can be substantial. Anthropic's framing of "same pricing" is technically accurate and practically misleading.

The honest assessment is this: Opus 4.7 is a significant improvement for coding, vision, and agentic reliability. It is not a universal improvement across all dimensions. The limitations are real, documented, and worth factoring into deployment decisions.

15. Pricing and Migration

Opus 4.7 maintains the same nominal pricing as Opus 4.6:

Input tokens: $5 per million tokens
Output tokens: $25 per million tokens
Context window: 1 million tokens
Max output: 128,000 tokens

These prices apply across the Claude API, with cloud provider pricing varying by platform (Amazon Bedrock, Vertex AI, and Microsoft Foundry each apply their own markup structures).

The critical caveat is the tokenizer change discussed in section 7. Same price per token, but potentially 35% more tokens per request. For budget-sensitive workloads, the effective cost increase may be meaningful.

Migration

The general migration path is straightforward: swap claude-opus-4-6 for claude-opus-4-7 in your API calls. If you are using Claude Managed Agents, no code changes are needed. But see section 9 for breaking changes around budget_tokens, temperature, top_p, and top_k parameters that will cause hard failures if not addressed.

Anthropic recommends testing your existing prompts on Opus 4.7 before switching production traffic. The model's more literal instruction-following means prompts optimized for Opus 4.6 may behave differently. Re-benchmark cost estimates due to the tokenizer change. And evaluate whether task budgets and xhigh effort could improve your existing workflows.

For teams currently on Sonnet 4.6, the upgrade path is less straightforward. Opus 4.7 is significantly more capable but also significantly more expensive. The decision depends on whether your workload benefits from the specific improvements (coding, vision, agentic reliability) or whether Sonnet's capabilities are sufficient. For a broader analysis of Claude model pricing and alternatives, our Claude Code pricing guide covers the full range of options.

Auto mode in Claude Code is now available for Max plan subscribers, not just Teams, Enterprise, and API customers. Auto mode lets Claude Code work more autonomously, handling decisions about when to search files, run tests, and verify changes without constant permission prompts. Combined with Opus 4.7's improved reliability, this effectively gives Max plan users a fully autonomous coding assistant.

16. How Opus 4.7 Compares to GPT-5.4 and Gemini 3.1 Pro

The three-way competition between Anthropic, OpenAI, and Google at the frontier has produced a nuanced landscape where no single model dominates every dimension. Here is how they compare as of April 2026.

Where Opus 4.7 Leads

Coding is Opus 4.7's strongest advantage. The 64.3% SWE-bench Pro score puts it meaningfully ahead of GPT-5.4's 57.7% and Gemini 3.1 Pro's 54.2%. On SWE-bench Verified, the gap narrows but persists: 87.6% for Opus 4.7 versus 80.6% for Gemini 3.1 Pro. On CursorBench, Opus 4.7's 70% is notably higher than competitors.

Agentic tool use is the second area of advantage. The 77.3% on MCP-Atlas versus Gemini's 73.9% reflects Opus 4.7's more reliable execution of multi-step workflows with tool calls. The 66% reduction in tool errors compared to Opus 4.6 is a capability that competitors have not matched in published benchmarks.

Where Competitors Lead or Match

General reasoning is effectively tied. GPQA Diamond scores of 94.2% (Opus 4.7), 94.4% (GPT-5.4 Pro), and 94.3% (Gemini 3.1 Pro) are statistically indistinguishable. This means that for tasks that are primarily about reasoning (analysis, question answering, problem decomposition), the choice between models should be driven by other factors like pricing, latency, or ecosystem fit.

Pricing is where Gemini 3.1 Pro has a structural advantage. At $3.50 per million input tokens and $10.50 per million output tokens, Gemini is approximately 30-60% cheaper than Opus 4.7 for equivalent workloads. For teams that need frontier-class reasoning but not Opus-level coding capability, Gemini's pricing advantage is meaningful.

Output pricing favors GPT-5.4, which charges $15 per million output tokens compared to Opus 4.7's $25 per million. For output-heavy workloads (code generation, long-form content, documentation), GPT-5.4 can be 40% cheaper per output token.

The Decision Framework

The choice between these three models depends on your primary use case. If your workload is coding-heavy (autonomous code generation, agentic debugging, code review), Opus 4.7 has a clear advantage. If your workload is cost-sensitive and reasoning-sufficient (classification, extraction, analysis), Gemini 3.1 Pro's pricing makes it the rational choice. If your workload requires broad ecosystem integration (plugins, function calling across many services, ChatGPT compatibility), GPT-5.4 has the widest partner network.

For AI agent platforms specifically, the combination of coding strength, agentic reliability, and the new task budget feature makes Opus 4.7 the strongest foundation for building autonomous software agents. We covered the broader AI market dynamics in our AI market power consolidation analysis, which examines how the frontier model race is reshaping the entire industry structure.

17. What This Means for AI Agent Platforms

The Opus 4.7 release has specific implications for the growing ecosystem of AI agent platforms. The improvements in coding, agentic reliability, and tool use directly affect how these platforms can leverage Claude as their underlying model.

The Structural Shift

There is a first-principles argument about what Opus 4.7 represents. When a model becomes 14% better at multi-step task completion and produces 66% fewer tool errors, the category of tasks you can delegate to it expands non-linearly. Each improvement in reliability does not just make existing tasks work better. It makes previously impossible tasks feasible.

Consider the difference between a model that completes an 8-step workflow 70% of the time versus 84% of the time. At 70%, you supervise every run because failure is common. At 84%, you can batch runs and only review the ones that flag issues. At 95%, you can fire-and-forget for most tasks. The relationship between reliability and autonomy is exponential, not linear. Opus 4.7 moves the cursor along this curve in a meaningful way.

This has implications for how agent platforms price and position their offerings. A platform built on a more reliable underlying model can offer higher-value automation with less human oversight. The task budget feature enables predictable pricing for agentic work, which has been one of the biggest barriers to enterprise adoption. When a customer asks "how much will this cost?" and the platform can give a confident answer, the sales conversation changes fundamentally.

Impact on Specific Categories

Coding agents benefit the most from Opus 4.7. The SWE-bench improvements translate directly to better autonomous code generation, debugging, and refactoring. Platforms like Cursor, Replit, and coding-focused agent systems can deliver more value per interaction. The /ultrareview feature in Claude Code shows how Anthropic itself is building higher-level agent capabilities on top of the model.

Browser automation agents benefit from the vision improvements. Higher resolution means better screenshot understanding, which means more reliable navigation of complex web interfaces. For platforms that offer browser-based automation (web scraping, form filling, testing), Opus 4.7's vision upgrade removes a significant capability limitation.

Knowledge work agents (research, analysis, document processing) benefit from the general reliability improvements and file-system memory. Agents that need to accumulate knowledge across a long research session can now maintain better continuity through their scratchpads and memory stores.

Platforms like o-mega.ai that operate cloud-based AI workforces, where multiple specialized agents coordinate on complex tasks, stand to gain from both the reliability improvements and the task budget feature. When you are orchestrating multiple agents in a workflow, the reliability of each individual agent compounds across the chain. A 14% improvement per agent translates to a much larger improvement in end-to-end workflow success rates. As we explored in our self-improving AI agents guide, the relationship between individual agent capability and system-level performance is multiplicative.

For a deep look at how self-improving agent architectures push beyond individual model capabilities, our self-improving software guide covers the technical patterns that make this work in production.

The Managed Agents Angle

Anthropic's Claude Managed Agents offering is evolving alongside the model improvements. Opus 4.7 has no breaking API changes for Managed Agents, which means existing deployments benefit immediately from the model upgrade. For a full analysis of the Managed Agents ecosystem, see our Claude Managed Agents guide.

The broader trajectory is clear: the combination of better models, better safety mechanisms, and better developer tools is making autonomous AI agents increasingly viable for production workloads. The question for agent platforms is no longer "can AI agents work?" but "what is the right architecture for deploying them at scale?"

The first-principles answer to that question comes from understanding what changes when model reliability crosses certain thresholds. Below 80% task success, agents need constant human supervision and are essentially assistants. Between 80-90%, agents can handle routine tasks with spot-checking. Above 90%, agents become autonomous workers that escalate only on genuine edge cases. Opus 4.7 pushes several key workloads into the 80-90% range for the first time, which means 2026 may be the year when AI agent platforms transition from "interesting experiment" to "essential infrastructure" for the enterprises that adopt them early.

For teams building in this space, the combination of Opus 4.7's reliability, task budgets for cost control, and the xhigh effort level for quality-cost optimization creates a more complete toolkit than any previous model generation. The Claude Cowork ecosystem adds another dimension for desktop-based agent workflows. We covered the full Cowork landscape in our Claude Cowork insider guide, which analyzes pricing, tactics, and how it compares to alternatives like Copilot Cowork.

18. Future Outlook

The Opus 4.7 release, combined with the Mythos revelation, gives us a clearer picture of where AI is heading for the rest of 2026.

The Capability-Safety Gap

The most important insight from this release is the growing gap between what frontier models can do and what their creators are willing to deploy. Mythos is more powerful than Opus 4.7 across every dimension, but its cybersecurity capabilities make it too dangerous to release broadly. This gap will likely widen with each model generation.

The implications are profound. We are entering an era where the most capable AI systems are not the ones available to the public. The best models exist behind safety barriers, with reduced-capability versions released for commercial use. This creates a two-tier AI landscape: the models that companies can use, and the models that exist but are too powerful to deploy.

This parallels other technology domains. Nuclear technology has civilian and military tiers. Encryption has consumer and government tiers. AI may be developing a similar structure, where the frontier of capability is classified or restricted while the commercial market operates on intentionally limited versions.

The question this raises for AI agent developers is foundational: if the best model is always behind a safety barrier, how do you build products that will eventually benefit from its capabilities? The answer may lie in building architectures that are model-agnostic at the inference layer. If your agent platform can swap between Opus 4.7, Mythos, and future models without architectural changes, you are positioned to benefit when capabilities are unlocked. If your architecture is tightly coupled to a specific model's behavior, each upgrade becomes a migration project rather than a capability expansion.

For teams navigating this uncertainty, our independent AI guide explores strategies for building AI systems that maintain operational independence from any single model provider.

The Platform Consolidation

Anthropic's move into design tools signals a broader trend: model providers becoming application providers. OpenAI did this with ChatGPT and its plugins. Google did this with Gemini integration across Workspace. Now Anthropic is doing it with design tools. The model API business is necessary but not sufficient. The real value capture happens at the application layer.

For independent AI agent platforms, this creates both threat and opportunity. The threat is that model providers build competing products with zero API costs. The opportunity is that model providers cannot specialize in every domain. Vertical agent platforms that deeply understand specific industries (finance, legal, healthcare, sales) can build moats that horizontal model providers cannot easily replicate.

As we analyzed in our coverage of the agent economy, the economic structure of AI agent deployment favors specialization. Generic agents compete on model capability, which commoditizes to zero margin. Specialized agents compete on domain knowledge, workflow understanding, and integration depth, which are defensible.

There is a structural analogy from the early cloud era that is instructive here. When AWS launched S3 and EC2, many observers predicted that AWS would eventually build every application category itself. Instead, what happened was that AWS built the infrastructure layer while thousands of SaaS companies built specialized applications on top of that infrastructure. The same dynamic is likely playing out in AI. Anthropic, OpenAI, and Google will build the infrastructure (models, APIs, basic tools) and some horizontal applications (design tools, coding assistants). But the vast majority of value creation will happen at the application layer, where companies combine these models with domain expertise, proprietary data, and customer relationships to deliver specific outcomes.

The design tool announcement does not change this structural dynamic. It confirms it. Anthropic is moving up the stack in categories where they have a natural advantage (design powered by their vision model) while leaving the broader ecosystem to specialists. The category to watch is not "will Anthropic compete with me?" but "does my specialization create enough value that my customers would not switch to a generic Anthropic tool?"

What to Watch

There are several developments to monitor in the months following this release.

First, watch for Mythos access expansion. The Cyber Verification Program is a first step toward controlled deployment. If the safeguards prove effective, Anthropic may expand access to Mythos-class capabilities for specific verified use cases. This would represent a new model for frontier AI deployment: capability-gated access rather than universal availability.

Second, watch for competitive responses. OpenAI and Google will likely respond to Opus 4.7's coding benchmark lead within months. The coding benchmark race has become the primary competitive dimension, and no lab will cede that ground for long. Expect targeted improvements from GPT-5.5 and Gemini 3.2 that specifically target SWE-bench and related metrics.

Third, watch the design tool market. If Anthropic's tool gains traction, it validates the model that AI labs build applications, not just APIs. This changes the strategic calculus for every company building on Claude, GPT, or Gemini APIs.

Fourth, watch for the real-world cost impact of the tokenizer change. As developers migrate to Opus 4.7 and observe actual billing changes, the community will develop a clearer picture of the effective price increase. If it is closer to 35% for typical workloads, some developers may stay on Opus 4.6 for cost-sensitive applications, creating a split ecosystem.

Fifth, watch for the implicit-need capability to spread across the model family. If Anthropic can bring Opus 4.7's ability to infer required tools and actions down to Sonnet-class models, the cost of agentic AI drops dramatically. Most agent workloads do not need Opus-level reasoning for every step; they need Opus-level tool inference for the planning step and Sonnet-level execution for the individual tasks. A model family that offers implicit-need inference at the Sonnet price point would reshape the economics of the entire agent ecosystem.

For those tracking the broader competitive landscape, our coverage of algorithms and the AI attention economy examines how these model improvements feed into the larger ecosystem of AI-powered content and decision systems. And for teams evaluating how to build on top of these capabilities, our guide on how to build products with AI fast covers the practical development patterns.

Claude Opus 4.7 is a release that rewards careful reading. The headline improvements in coding and vision are real and significant. The new features (task budgets, xhigh effort) address genuine production pain points. The tokenizer change has cost implications that deserve attention. And the Mythos shadow raises questions about the future of frontier AI deployment that the entire industry will need to answer.

The model is available now across all Claude products and cloud providers. The API model ID is claude-opus-4-7. Pricing is $5/$25 per million input/output tokens (watch the tokenizer impact). Audit your API calls for removed parameters before migrating. Start with xhigh effort for coding workloads and test task budgets for agentic workflows. Read Anthropic's migration guide and best practices document before deploying to production.

For the complete picture of how Opus 4.7 fits into the broader Claude ecosystem, including Cowork, Managed Agents, and Claude Code, see our Anthropic ecosystem guide. For teams evaluating how agent platforms are evolving alongside these model improvements, our paperclip AI agent companies guide covers the broader competitive landscape of companies building on top of these foundations.

This guide reflects the AI model landscape as of April 17, 2026. Pricing, benchmarks, and features change frequently. Verify current details on the Anthropic documentation before making purchasing decisions.

Yuma Heymans

17 April 2026

•

51 min read

The insider guide to Anthropic's newest flagship model: benchmarks, features, pricing, safety architecture, and what it means for the AI landscape.

Written by Yuma Heymans (@yumahey), founder of o-mega.ai and researcher focused on AI agent architectures and autonomous systems infrastructure.

What Is Claude Opus 4.7
The Benchmark Breakdown
Computer Use and OSWorld
New Features and Capabilities
Vision: High-Resolution Image Support
Task Budgets and Agentic Control
The xhigh Effort Level
The Tokenizer Change Nobody Is Talking About
Breaking API Changes and Migration Pitfalls
Safety, Mythos, and the Cyber Verification Program
The AI Design Tool and Market Shockwaves
Enterprise Adoption: Who Is Already Using It
Developer Reactions and Early Adoption
Limitations, Criticisms, and Known Weaknesses
Pricing and Migration
How Opus 4.7 Compares to GPT-5.4 and Gemini 3.1 Pro
What This Means for AI Agent Platforms
Future Outlook

1. What Is Claude Opus 4.7

2. The Benchmark Breakdown

Coding Benchmarks

General Reasoning Benchmarks

Finance and Domain-Specific Benchmarks

3. Computer Use and OSWorld

4. New Features and Capabilities

For a deeper look at the Claude Code ecosystem and how these capabilities fit into the broader development workflow, we covered the full landscape in our Claude Desktop, Cowork and Code guide.

4. Vision: High-Resolution Image Support

The vision improvements are also directly relevant for the emerging AI design tool that Anthropic announced alongside Opus 4.7, which we cover in section 11.

6. Task Budgets and Agentic Control

7. The xhigh Effort Level

8. The Tokenizer Change Nobody Is Talking About

9. Breaking API Changes and Migration Pitfalls

10. Safety, Mythos, and the Cyber Verification Program

The most interesting part of the Opus 4.7 release is not what Anthropic shipped. It is what they did not ship, and why.

We covered the full scope of Project Glasswing in our dedicated guide, which analyzes the strategic implications for both the AI industry and the cybersecurity ecosystem.

Safety Profile

11. The AI Design Tool and Market Shockwaves

12. Enterprise Adoption: Who Is Already Using It

13. Developer Reactions and Early Adoption

14. Limitations, Criticisms, and Known Weaknesses

The Mythos Gap Is Real

Reduced Cybersecurity Capabilities

Safety Downgrade on Harm Reduction

Exploration Loops and Memory Loss

Multilingual Performance

The Effective Price Increase

15. Pricing and Migration

Opus 4.7 maintains the same nominal pricing as Opus 4.6:

Input tokens: $5 per million tokens
Output tokens: $25 per million tokens
Context window: 1 million tokens
Max output: 128,000 tokens

These prices apply across the Claude API, with cloud provider pricing varying by platform (Amazon Bedrock, Vertex AI, and Microsoft Foundry each apply their own markup structures).

Migration

16. How Opus 4.7 Compares to GPT-5.4 and Gemini 3.1 Pro

Where Opus 4.7 Leads

Where Competitors Lead or Match

The Decision Framework

17. What This Means for AI Agent Platforms

The Structural Shift

Impact on Specific Categories

For a deep look at how self-improving agent architectures push beyond individual model capabilities, our self-improving software guide covers the technical patterns that make this work in production.

The Managed Agents Angle

18. Future Outlook

The Opus 4.7 release, combined with the Mythos revelation, gives us a clearer picture of where AI is heading for the rest of 2026.

The Capability-Safety Gap

For teams navigating this uncertainty, our independent AI guide explores strategies for building AI systems that maintain operational independence from any single model provider.

The Platform Consolidation

What to Watch

There are several developments to monitor in the months following this release.

Contents

1. What Is Claude Opus 4.7

2. The Benchmark Breakdown

Coding Benchmarks

General Reasoning Benchmarks

Finance and Domain-Specific Benchmarks

3. Computer Use and OSWorld

4. New Features and Capabilities

4. Vision: High-Resolution Image Support

6. Task Budgets and Agentic Control

7. The xhigh Effort Level

8. The Tokenizer Change Nobody Is Talking About

9. Breaking API Changes and Migration Pitfalls

10. Safety, Mythos, and the Cyber Verification Program

Safety Profile

11. The AI Design Tool and Market Shockwaves

12. Enterprise Adoption: Who Is Already Using It

13. Developer Reactions and Early Adoption

14. Limitations, Criticisms, and Known Weaknesses

The Mythos Gap Is Real

Reduced Cybersecurity Capabilities

Safety Downgrade on Harm Reduction

Exploration Loops and Memory Loss

Multilingual Performance

The Effective Price Increase

15. Pricing and Migration

Migration

16. How Opus 4.7 Compares to GPT-5.4 and Gemini 3.1 Pro

Where Opus 4.7 Leads

Where Competitors Lead or Match

The Decision Framework

17. What This Means for AI Agent Platforms

The Structural Shift

Impact on Specific Categories

The Managed Agents Angle

18. Future Outlook

The Capability-Safety Gap

The Platform Consolidation

What to Watch

Contents

1. What Is Claude Opus 4.7

2. The Benchmark Breakdown

Coding Benchmarks

General Reasoning Benchmarks

Finance and Domain-Specific Benchmarks

3. Computer Use and OSWorld

4. New Features and Capabilities

4. Vision: High-Resolution Image Support

6. Task Budgets and Agentic Control

7. The xhigh Effort Level

8. The Tokenizer Change Nobody Is Talking About

9. Breaking API Changes and Migration Pitfalls

10. Safety, Mythos, and the Cyber Verification Program

Safety Profile

11. The AI Design Tool and Market Shockwaves

12. Enterprise Adoption: Who Is Already Using It

13. Developer Reactions and Early Adoption

14. Limitations, Criticisms, and Known Weaknesses

The Mythos Gap Is Real

Reduced Cybersecurity Capabilities

Safety Downgrade on Harm Reduction

Exploration Loops and Memory Loss

Multilingual Performance

The Effective Price Increase

15. Pricing and Migration

Migration

16. How Opus 4.7 Compares to GPT-5.4 and Gemini 3.1 Pro

Where Opus 4.7 Leads

Where Competitors Lead or Match

The Decision Framework

17. What This Means for AI Agent Platforms

The Structural Shift

Impact on Specific Categories

The Managed Agents Angle

18. Future Outlook

The Capability-Safety Gap

The Platform Consolidation

What to Watch