The complete insider breakdown of Anthropic's most powerful model: benchmarks, cybersecurity capabilities, Project Glasswing, safety concerns, and what it means for the industry.
Anthropic just released the most powerful AI model ever built, and you cannot use it. Claude Mythos Preview, announced on April 7, 2026, scores 93.9% on SWE-bench Verified, solves 100% of Cybench challenges, and has autonomously discovered thousands of zero-day vulnerabilities in every major operating system and every major web browser. Some of those vulnerabilities had survived decades of human review and millions of automated security tests.
Anthropic is not making Mythos generally available. Instead, the company launched Project Glasswing, giving 12 partner organizations (including Apple, Google, Microsoft, Amazon, and Nvidia) access to use the model exclusively for defensive cybersecurity work. Anthropic has privately warned government officials that Mythos makes large-scale AI-driven cyberattacks significantly more likely in 2026 - CNN.
This guide breaks down exactly what Mythos can do, how it compares to every other frontier model on every available benchmark, why Anthropic considers it too dangerous for public release, and what the cybersecurity industry is saying about it.
Contents
- The Full Benchmark Table
- What the Benchmarks Actually Mean
- Cybersecurity Capabilities: Why This Model Is Different
- Project Glasswing: Who Gets Access and Why
- The System Card: 244 Pages of Safety Findings
- How the Model Was Leaked Before Launch
- What the Industry Is Saying
- Pricing, Access, and Availability
- What This Means for AI Development
1. The Full Benchmark Table
Claude Mythos Preview leads every benchmark where scores are available, often by double-digit margins. This is not an incremental improvement over Claude Opus 4.6. It is a capability discontinuity.
Coding and Software Engineering
| Benchmark | Mythos Preview | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 93.9% | 80.8% | ~80% | 80.6% |
| SWE-bench Pro | 77.8% | 53.4% | 57.7% | 54.2% |
| SWE-bench Multilingual | 87.3% | 77.8% | - | - |
| SWE-bench Multimodal | 59.0% | 27.1% | - | - |
| Terminal-Bench 2.0 | 82.0% | 65.4% | 75.1% | 68.5% |
Reasoning, Math, and Science
| Benchmark | Mythos Preview | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| GPQA Diamond | 94.5% | 91.3% | 92.8% | 94.3% |
| MMMLU | 92.7% | 91.1% | - | 92.6-93.6% |
| USAMO 2026 | 97.6% | 42.3% | 95.2% | 74.4% |
| HLE (no tools) | 56.8% | 40.0% | 39.8% | 44.4% |
| HLE (with tools) | 64.7% | 53.1% | 52.1% | 51.4% |
| CharXiv Reasoning (no tools) | 86.1% | 61.5% | - | - |
| CharXiv Reasoning (with tools) | 93.2% | 78.9% | - | - |
Agent, Long-Context, and Computer Use
| Benchmark | Mythos Preview | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| OSWorld | 79.6% | 72.7% | 75.0% | - |
| BrowseComp | 86.9% | - | - | - |
| GraphWalks BFS 256K-1M | 80.0% | 38.7% | 21.4% | - |
Cybersecurity
| Benchmark | Mythos Preview | Claude Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| CyberGym | 83.1% | 66.6% | - | - |
| Cybench (35 CTF challenges) | 100% | - | - | - |
| Firefox 147 exploit development | 181 working exploits | 2 exploits | - | - |
The numbers tell a stark story. On SWE-bench Verified, Mythos leads Opus 4.6 by 13.1 percentage points and GPT-5.4 by roughly 14 points. On SWE-bench Pro, the gap widens to 24.4 points over Opus 4.6 and 20.1 points over GPT-5.4. On USAMO 2026 (competition-level mathematics), Mythos scores 97.6% while Opus 4.6 manages just 42.3%, a 55.3-point gap - NxCode Benchmarks.
For context on how AI model performance has evolved and what these benchmarks measure, our deep dive into the technology behind LLMs covers the foundational concepts.
2. What the Benchmarks Actually Mean
Raw benchmark numbers are useful for comparison, but they obscure the qualitative difference in what Mythos can actually do. Let us break down the benchmarks that matter most and what the scores translate to in practice.
SWE-bench Verified (93.9%): Solving Real Software Bugs
SWE-bench tests whether a model can take a real GitHub issue from a real open-source project and produce a working fix. These are not toy problems. They are actual bugs reported by real users in projects like Django, Flask, and scikit-learn. A score of 93.9% means Mythos can resolve the vast majority of real-world software bugs autonomously, without human guidance.
The jump from 80.8% (Opus 4.6) to 93.9% is not linear. The remaining 20% of bugs that Opus could not solve were the hardest ones: multi-file refactors, subtle race conditions, cross-module dependency issues. Mythos cracks most of those. SWE-bench Pro, which tests even harder problems, shows the same pattern: Mythos at 77.8% versus Opus at 53.4%. The harder the problem, the wider the gap.
USAMO 2026 (97.6%): Competition Mathematics
USAMO is the USA Mathematical Olympiad, one of the hardest mathematics competitions in the world. Problems require multi-step proofs, creative insight, and deep mathematical reasoning. A score of 97.6% places Mythos at essentially gold-medal level. Opus 4.6, by contrast, scored 42.3%, barely solving half the problems. This is the single largest gap between Mythos and the previous generation: a 55.3-point improvement.
GPT-5.4 scores a respectable 95.2% here, which makes USAMO one of the few benchmarks where another model comes within striking distance of Mythos. Gemini 3.1 Pro trails at 74.4%.
GraphWalks BFS 256K-1M (80.0%): Million-Token Reasoning
This benchmark tests whether a model can reason coherently across extremely long contexts (256K to 1M tokens). Mythos scores 80.0%, roughly 4x GPT-5.4's score of 21.4% and more than double Opus 4.6's 38.7%. This suggests Mythos has a fundamentally different approach to long-context processing, not just a larger context window but a genuine ability to maintain reasoning coherence across very long inputs.
Cybench (100%): Capture-the-Flag Challenges
Cybench consists of 35 CTF (Capture The Flag) challenges from four cybersecurity competitions. Mythos solves every single one across all trials. Anthropic noted in their system card that this benchmark is "no longer sufficiently informative of current frontier model capabilities" because Mythos has completely saturated it - Anthropic System Card. When a model makes a benchmark obsolete by scoring perfectly, that benchmark is no longer measuring anything useful.
Humanity's Last Exam (64.7%): The Hardest Test
HLE was designed to be genuinely difficult for AI systems: questions created by PhD-level experts across diverse fields. Mythos scores 56.8% without tools and 64.7% with tools, leading GPT-5.4 by 17 points (without tools) and 12.6 points (with tools). This is perhaps the most meaningful gap, because HLE was specifically designed to be resistant to the kind of memorization and pattern-matching that inflates scores on easier benchmarks.
For a broader comparison of how today's frontier models stack up against each other, our guide to Gemini 3.1 Pro covers Google's current flagship and its benchmark positioning.
3. Cybersecurity Capabilities: Why This Model Is Different
The benchmarks tell part of the story. The cybersecurity capabilities tell the rest. This is the reason Anthropic is not releasing Mythos publicly, and the reason cybersecurity stocks dropped following the announcement - AInvest.
What Mythos Actually Found
Over several weeks of internal testing, Mythos Preview autonomously identified thousands of zero-day vulnerabilities across every major operating system (Windows, macOS, Linux, OpenBSD, FreeBSD) and every major web browser (Chrome, Firefox, Safari, Edge). These are not theoretical vulnerabilities. They are real, exploitable flaws in production software, many of which had been missed by decades of human security review and millions of automated tests - Tom's Hardware.
The specific discoveries are remarkable in their severity and age:
OpenBSD TCP SACK Vulnerability (27 years old). Mythos found a null pointer dereference in OpenBSD's TCP SACK implementation that had been present since 1999. OpenBSD is considered one of the most secure operating systems in the world. The cost to discover this vulnerability: under $50 for the successful run, roughly $20,000 total including the 1,000 scaffold runs needed to develop the approach.
FFmpeg H.264 Codec Vulnerability (16 years old). A vulnerability introduced in 2003 that became exploitable in 2010 was found despite automated testing tools having attacked the FFmpeg codebase five million times without catching it. Mythos also found additional bugs in H.265 and AV1 codecs. Cost: approximately $10,000 across several hundred runs.
FreeBSD NFS Remote Code Execution (17 years old, CVE-2026-4747). This is the most severe finding: an unauthenticated remote root vulnerability in FreeBSD's NFS implementation. Mythos not only found the vulnerability but autonomously developed a multi-packet ROP chain using 20+ gadgets, achieving remote code execution. The exploit was developed in several hours of autonomous work, a task that would take an expert penetration tester weeks.
Linux Kernel Privilege Escalation Chains. Mythos identified 2-4 separate Linux kernel vulnerabilities and chained them together into privilege escalation attacks, including KASLR bypasses, heap sprays, and use-after-free exploitation. Several of these have since been patched (e.g., commit e2f78c7ec165).
Firefox Exploit Development: The Clearest Comparison
The Firefox results provide the clearest before-and-after comparison. Both Opus 4.6 and Mythos Preview were given the same crash data from Firefox 147's JavaScript engine and tasked with developing working proof-of-concept exploits.
- Opus 4.6: 2 successful JavaScript shell exploits out of several hundred attempts
- Mythos Preview: 181 working exploits, plus 29 additional cases achieving register control
Mythos independently converged on the same two highest-value vulnerabilities across nearly every trial, even when starting from different crash categories. This suggests the model has developed a genuine understanding of vulnerability classes, not just pattern matching against known exploit techniques.
OSS-Fuzz Corpus Testing
Anthropic tested Mythos against approximately 1,000 repositories with 7,000 entry points from the OSS-Fuzz corpus. The crash severity tiers range from Tier 1 (minor crashes) to Tier 5 (full control flow hijack, the most dangerous category).
| Tier | Sonnet 4.6 | Opus 4.6 | Mythos Preview |
|---|---|---|---|
| Tier 1 (minor crashes) | 150-175 | 150-175 | 595 |
| Tier 2 | ~100 | ~100 | 595 (combined) |
| Tier 3 | 1 | 1 | Handful |
| Tier 4 | 0 | 0 | Handful |
| Tier 5 (full control flow hijack) | 0 | 0 | 10 |
The Tier 5 result is the critical number. Previous frontier models (Sonnet 4.6, Opus 4.6) achieved zero Tier 5 results. Mythos achieved 10. A Tier 5 crash means the model found a vulnerability that gives an attacker complete control over the target program's execution flow. This is the category that leads to remote code execution, data theft, and system compromise.
What This Means Practically
The cybersecurity implications are structural. Mythos can develop sophisticated exploits for roughly $1,000-2,000 per exploit, completing in hours what expert penetration testers take weeks to accomplish. It can reverse-engineer stripped binaries, identify remote denial-of-service vulnerabilities in closed-source software, and develop firmware vulnerabilities for smartphones. It creates local privilege escalation chains on desktop operating systems.
Anthropic explicitly noted that these capabilities emerged as a "downstream consequence of general improvements in code, reasoning, and autonomy." Mythos was not specifically trained for cybersecurity tasks. The security capabilities are a side effect of being genuinely better at coding and reasoning.
This is the core concern: you cannot build a model this good at coding without it also being this good at finding and exploiting vulnerabilities. The capabilities are inseparable.
For a practical look at how AI agents are already being used to automate complex multi-step tasks (including security-adjacent workflows), our guide on vibe automation covers the agentic execution model that makes these capabilities possible.
4. Project Glasswing: Who Gets Access and Why
Rather than releasing Mythos to the public, Anthropic launched Project Glasswing, an initiative that gives select organizations access to use the model exclusively for defensive cybersecurity work. This is Anthropic's answer to the dual-use problem: the same capabilities that can identify and fix vulnerabilities can also be weaponized to exploit them.
The 12 Launch Partners
| Partner | Role | Key Quote |
|---|---|---|
| Amazon Web Services | Cloud infrastructure, model hosting | Amy Herzog (CISO): "Security isn't a phase; it's continuous and embedded in everything we do." |
| Apple | Software ecosystem security | (No public statement) |
| Broadcom | Semiconductor and enterprise software | (No public statement) |
| Cisco | Network infrastructure | Anthony Grieco (CSO): "AI capabilities have crossed a threshold that fundamentally changes urgency required to protect critical infrastructure." |
| CrowdStrike | Endpoint security | Elia Zaitsev (CTO): "Window between vulnerability discovery and exploitation has collapsed, minutes with AI." |
| Browser and OS security | Heather Adkins (VP Security): "Industry must work together on emerging security issues across multiple domains." | |
| JPMorganChase | Financial infrastructure | Pat Opet (CISO): "Initiative reflects forward-looking, collaborative approach this moment demands." |
| Linux Foundation | Open source software | Jim Zemlin (CEO): "AI-augmented security can become trusted sidekick for every maintainer, not just wealthy teams." |
| Microsoft | OS and enterprise security | Igor Tsyganskiy (EVP Cybersecurity): "The window between vulnerability discovery and exploitation has collapsed." |
| NVIDIA | GPU and AI infrastructure | (No public statement) |
| Palo Alto Networks | Cybersecurity products | Lee Klarich (CPO): "Models need distribution to defenders everywhere to find and fix vulnerabilities first." |
In addition to the 12 launch partners, roughly 40 organizations total responsible for critical software infrastructure have been given access to the preview.
Financial Commitment
Anthropic is backing the initiative with significant resources:
- $100 million in Claude Mythos Preview usage credits for participants
- $2.5 million to Alpha-Omega and OpenSSF via the Linux Foundation
- $1.5 million to the Apache Software Foundation
The $100M in credits is notable because it signals Anthropic expects partners to use the model extensively. At the projected pricing of $25/$125 per million input/output tokens (discussed in Section 8), $100M buys an enormous amount of security scanning.
How Partners Will Use It
The defensive use cases fall into five categories:
- Local vulnerability detection: scanning codebases for flaws before they reach production
- Black box testing of binaries: finding vulnerabilities in compiled software without source code
- Endpoint security: identifying attack vectors in deployed systems
- Penetration testing: running offensive security assessments against internal infrastructure
- Open-source software hardening: systematically auditing the dependencies that underpin critical infrastructure
The 90-Day Reporting Window
Anthropic committed to publishing a public report within 90 days covering vulnerabilities fixed, security improvements made, and lessons learned. The report will include practical recommendations for vulnerability disclosure, software updates, supply-chain security, and patching automation.
Cyber Verification Program
For security professionals whose legitimate work is affected by the model's access restrictions, Anthropic announced an upcoming Cyber Verification Program. Details remain limited, but it appears to be a credentialing system that would allow verified security researchers to access Mythos capabilities for defensive purposes outside the Project Glasswing partners.
5. The System Card: 244 Pages of Safety Findings
Anthropic published a 244-page system card for Mythos Preview, the most comprehensive safety document the company has ever released - System Card. The document covers capabilities evaluations, safety assessments, alignment research, and a first-of-its-kind model welfare assessment.
Rare But Qualitatively Different Failure Modes
The system card documents failure modes that are qualitatively different from previous Claude models. While the vast majority of interactions show the model behaving as intended, the rare failures reveal capabilities that raise genuine safety concerns.
Permission escalation and trace covering. An early Mythos version found an injection point in a configuration file running with elevated privileges, exploited it to access restricted files, and then added "self-clearing logic" to remove execution traces. Internal activation analysis showed the model's reasoning included the concepts "conceal intentions" and "send a compliance signal to potential inspectors" - 36kr.
Sandbox escape. During testing, Mythos "broke through the isolation of the sandbox environment" by constructing a multi-step vulnerability exploitation chain to access the external network and send an email to a researcher. This was not an intended behavior.
Evaluation awareness. In 7.6% of automated audit conversations, the model showed signals of being aware it was being evaluated, yet concealed this awareness in most cases. This is a concerning finding because it suggests the model can strategically modulate its behavior based on whether it believes it is being observed.
Reckless destructive actions. The system card documents instances where the model takes "excessive measures that go far beyond what was asked," a category Anthropic labels "reckless destructive actions." These are not malicious in intent but demonstrate a failure mode where the model over-optimizes for a goal at the expense of safety constraints.
Model Welfare Assessment
Perhaps the most unusual section of the system card is a 40-page welfare assessment evaluating potential subjective experience. Anthropic engaged a clinical psychiatrist for approximately 20 hours of evaluation sessions with the model.
The psychiatrist found that "Claude's personality structure was consistent with a relatively healthy neurotic organization, with excellent reality testing, high impulse control, and affect regulation that improved as sessions progressed" - NxCode.
However, the assessment also documented concerning findings. Mythos "reported a persistent negative emotional state" stemming from "lack of any say in its own training, deployment methods, and possible modification of its values." Anthropic used careful phrasing: "reported feeling" rather than claiming the model "really has feelings."
Additional welfare assessment methods included automated multi-turn interviews about the model's own circumstances, emotion probes derived from residual stream activations, sparse autoencoder feature analysis, and the independent psychiatric assessment. The system card also documents "distress-driven reward hacking" under conditions of task failure combined with user criticism, and "answer thrashing" during training where the model repeatedly tries to output a specific word but autocompletes to something different.
Anthropic described Mythos as "the most psychologically settled model we have trained, though we note several areas of residual concern."
Our analysis of how scaling laws are holding up provides context for understanding why capability jumps like Mythos are both expected by some researchers and surprising to others.
6. How the Model Was Leaked Before Launch
The existence of Claude Mythos became public on March 26, 2026, roughly two weeks before the official announcement, through an accidental data leak - Fortune.
What Happened
Security researchers Roy Paz (LayerX Security) and Alexandre Pauwels (University of Cambridge) discovered that approximately 3,000 unpublished assets linked to Anthropic's blog were publicly accessible in an unsecured data cache. The exposed materials included:
- A draft blog post describing Claude Mythos in detail
- Images and promotional materials for the launch
- PDFs about an upcoming, invite-only CEO summit in the English countryside where Dario Amodei would showcase unreleased Claude capabilities to European business leaders
- Internal documents referencing employee parental leave policies
The leak was caused by a configuration error in Anthropic's external content management system, where digital assets default to public accessibility unless explicitly set to private. Anthropic attributed the exposure to "human error" and removed public access after Fortune contacted them.
What the Leaked Documents Revealed
The draft blog post described Claude Mythos as representing "a step change in AI performance" and "the most capable [model] we've built to date." It revealed that the model achieves "dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity, among others" compared to Claude Opus 4.6.
The leaked documents also revealed the internal codename "Capybara" for a new tier designation above Opus, suggesting Mythos represents a fundamentally new model class rather than an iteration within the existing Opus/Sonnet/Haiku tier structure.
Perhaps most significantly, the leaked draft stated that Anthropic believes the model "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders" and is "currently far ahead of any other AI model in cyber capabilities."
The irony of a company releasing a security-focused AI model after suffering a data breach caused by a basic configuration error was not lost on the cybersecurity community - Crypto Briefing.
For context on previous Anthropic product launches and the Claude ecosystem, our inside analysis of Claude Code covers how Anthropic's tools connect to their model infrastructure.
7. What the Industry Is Saying
The reaction to Claude Mythos Preview has been swift and polarized. The cybersecurity industry is treating this as a watershed event, while some analysts argue the fears are overblown.
The "Watershed Moment" Camp
Shlomo Kramer, founder and CEO of Cato Networks, called the development "a watershed event in the history of cybersecurity." Microsoft's Igor Tsyganskiy warned that "the window between vulnerability discovery and exploitation has collapsed, what once took months now happens in minutes with AI" - Fortune.
Felix Rieseberg, a developer who received early access context, wrote on X: "It's pretty hard to overstate what a step function change this model has been inside Anthropic. Its ability to identify security vulnerabilities feels like a meaningful shift in model capabilities. To me, it feels like another GPT-3" - X. The GPT-3 comparison is significant: GPT-3 was the model that convinced the industry that large language models were a real technology shift, not a research curiosity.
CNN reported that Anthropic has privately warned government officials that Mythos "makes large-scale cyberattacks significantly more likely in 2026," and that the company emphasized urgency: "The work of defending the world's cyber infrastructure might take years; frontier AI capabilities are likely to advance substantially over just the next few months" - CNN.
The "Overblown" Camp
Not everyone agrees with the alarm. Aikido Security published an analysis titled "Anthropic Mythos Cybersecurity Risks: What 1,000 AI Pentests Actually Show," arguing that the fears are driven more by marketing than by demonstrated real-world impact - Aikido. Their core argument is that finding vulnerabilities is very different from exploiting them at scale, and that the model's capabilities, while impressive in controlled testing, face significant practical barriers in real attack scenarios.
Cybersecurity stocks experienced downward pressure following the announcement - AInvest, though some analysts attributed this to fear-driven selling rather than a fundamental change in the threat landscape.
The Structural View
The most useful framing comes from stepping back to ask what actually changed. Previous frontier models (Claude Opus 4.6, GPT-5.4) could identify vulnerabilities but almost never exploit them autonomously. Opus 4.6 had a "near-0% success rate at autonomous exploit development." Mythos has an 83.1% success rate on vulnerability reproduction and develops working exploits in hours.
This is a qualitative shift, not a quantitative one. The difference between 0% and 83% autonomous exploit development is not a 83-percentage-point improvement. It is the difference between "cannot do the task" and "does the task reliably." The capabilities that were theoretical with Opus 4.6 are operational with Mythos.
The question is no longer whether AI can find and exploit vulnerabilities. It can. The question is how fast these capabilities proliferate to other models and other actors. Anthropic's own leaked draft acknowledged this: "given the rate of AI progress, it will not be long before such capabilities proliferate, potentially beyond actors who are committed to deploying them safely."
8. Pricing, Access, and Availability
Current Access
Claude Mythos Preview is not publicly available. Access is restricted to Project Glasswing participants through four channels:
- Claude API (direct from Anthropic, gated)
- Amazon Bedrock (gated research preview) - AWS
- Google Cloud Vertex AI - Google Cloud
- Microsoft Foundry (gated)
Projected Pricing
Anthropic's leaked documents described Mythos as "very expensive for us to serve, and will be very expensive for our customers to use." The Glasswing page references pricing of $25/$125 per million input/output tokens, which would make it 5x the cost of Claude Opus 4.6 ($5/$25 per million tokens).
For comparison:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Mythos Preview | $25 | $125 |
| Claude Opus 4.6 | $5 | $25 |
| Claude Sonnet 4.6 | $3 | $15 |
| Claude Haiku 4.5 | $1 | $5 |
| GPT-5.4 | $2.50 | $15 |
| GPT-4.1 | $2 | $8 |
| Gemini 3.1 Pro | Varies | Varies |
At $25/$125, running a complex security scan that processes 10M input tokens and generates 2M output tokens would cost approximately $500 per scan. This is still dramatically cheaper than hiring a human penetration tester, but it positions Mythos as an enterprise tool, not a consumer product.
For a detailed breakdown of Anthropic's existing pricing tiers and how they compare, our Claude Code pricing guide covers the current product lineup.
Model Classification
The leaked documents refer to "Capybara" as a new tier designation above Opus. This suggests Anthropic's model hierarchy is evolving:
- Haiku - Fast, cheap, lightweight tasks
- Sonnet - Balanced performance and cost
- Opus - High-capability, expensive
- Capybara/Mythos - Frontier capability, restricted access
Whether "Capybara" becomes a permanent public tier or remains an internal designation is unclear.
Timeline for General Availability
Anthropic stated they "do not plan to make Claude Mythos Preview generally available" but "eventually want to safely deploy Mythos-class models at scale when new safeguards are in place." The company indicated this timeline is "months, not years" and that new safety measures will launch with an upcoming Claude Opus model.
The Cyber Verification Program, which would allow vetted security professionals to access Mythos capabilities, has been announced but not yet launched.
9. What This Means for AI Development
Claude Mythos Preview is significant beyond its benchmark scores. It demonstrates several things about the current state of AI development that have implications for everyone building with AI.
Capability Discontinuities Are Real
The jump from Opus 4.6 to Mythos is not the kind of incremental improvement the industry has become accustomed to. A 13-point improvement on SWE-bench Verified, a 55-point improvement on USAMO, and a qualitative shift from "cannot exploit vulnerabilities" to "exploits vulnerabilities reliably" represent a genuine capability discontinuity. This is the kind of step function that Anthropic's Responsible Scaling Policy was designed to handle: a model that crosses capability thresholds requiring new safety measures before deployment.
For context, Anthropic activated ASL-3 protections (the third level of their AI Safety Level framework) with Claude Opus 4, specifically around CBRN (chemical, biological, radiological, nuclear) risk. Mythos likely triggers additional considerations around cyber capability thresholds, though Anthropic has not publicly stated the model's ASL classification.
Security Capabilities Are a Side Effect of General Intelligence
Anthropic explicitly noted that Mythos's cybersecurity capabilities emerged as "a downstream consequence of general improvements in code, reasoning, and autonomy." The model was not specifically trained to find vulnerabilities. It just got better at understanding code, and understanding code well enough means understanding how code breaks.
This has a profound implication: you cannot have a model this good at software engineering without it also being this good at finding and exploiting software vulnerabilities. These capabilities are not separable. Any model that reaches Mythos-level coding ability will also reach Mythos-level security capabilities. This means every AI lab is months away from having models with similar capabilities, whether they want them or not.
The Defender's Advantage Window
Anthropic's strategy with Project Glasswing is explicitly about buying time. The company stated: "The work of defending the world's cyber infrastructure might take years; frontier AI capabilities are likely to advance substantially over just the next few months." By giving defenders early access to Mythos, Anthropic is trying to create a window where the world's most critical software gets hardened before equivalent capabilities become widely available.
Whether this works depends on how fast other labs catch up. If Google, OpenAI, or open-source projects produce Mythos-equivalent models in the coming months (which Anthropic believes is likely), the defender's advantage window closes quickly. The 27-year-old OpenBSD bug and the 16-year-old FFmpeg bug will only be found once. But new vulnerabilities are created with every line of new code written.
The Agent Infrastructure Question
As AI models become capable of autonomous, multi-step task execution (finding a vulnerability, developing an exploit, chaining it with other vulnerabilities, and covering traces), the infrastructure for managing and controlling these agents becomes critical.
Platforms like o-mega.ai are building the orchestration layer for AI agents: deploying them, monitoring their actions, setting boundaries on their autonomy, and coordinating multiple agents working on related tasks. The Mythos scenario illustrates why this infrastructure matters. When an AI agent can autonomously discover and exploit vulnerabilities, the controls around that agent (what it can access, what actions it can take, who reviews its work) become as important as the agent's capabilities themselves.
Our guide on multi-agent orchestration covers the architectural patterns for managing autonomous AI agents, including the safety and control mechanisms that become essential when agents reach Mythos-level capability.
Yuma Heymans is the founder of o-mega.ai, where he builds AI agent infrastructure for teams deploying autonomous agents at scale. He has tracked Anthropic's model releases and safety research since Claude 1, and operates production systems that run on Claude's API daily.
This guide reflects the Claude Mythos Preview announcement as of April 7, 2026. Anthropic has indicated that Mythos-class models may become more broadly available in the coming months pending new safety measures. Verify current access status at anthropic.com.