On April 7, 2026, Anthropic announced something it had never done before: it built an AI model so capable that it decided not to release it to the public.
The model is called Claude Mythos. In the weeks before the announcement, it had autonomously found thousands of zero-day vulnerabilities across every major operating system and every major web browser — including a 27-year-old flaw in OpenBSD that had survived decades of human and automated scrutiny, and a 16-year-old bug in FFmpeg that had withstood more than five million automated tests without detection.
Anthropic's response was Project Glasswing: a $100 million initiative to put Mythos to work for defence — partnering with AWS, Apple, Microsoft, Google, Cisco, NVIDIA, JPMorganChase, and 40 other organisations to harden critical software before similar capabilities reach threat actors. The company stated plainly that it believes those capabilities will proliferate to less restrained actors within months, not years.
The announcement triggered an extraordinary response across the cybersecurity community. Emergency meetings. Government briefings. Banking executives privately warned that Mythos makes large-scale cyberattacks significantly more likely in 2026. The New York Times called it a "terrifying warning sign."
And in the middle of all this noise, most enterprise security teams are drawing exactly the wrong conclusions.
Claude and large language models are the most significant development in cybersecurity in a decade — for attackers and defenders simultaneously. But the prevailing understanding of what LLMs actually do, what they cannot do, and what they mean specifically for email security is riddled with myths that lead to genuinely dangerous decisions about where to invest, what to trust, and what to ignore.
This post addresses the five most consequential of those myths — using Claude Mythos and Project Glasswing as the lens through which to understand where AI in cybersecurity actually stands in April 2026.
Claude Mythos did not change what AI is capable of. It revealed, publicly and undeniably, what AI has already become. The myths addressed in this post exist because most organisations were not paying close enough attention.
The dominant narrative following the Claude Mythos announcement has been threat-focused. AI finds zero-days. AI generates exploits. AI enables attacks at machine speed. Governments convene emergency meetings. The framing is almost universally adversarial.
This framing is accurate but dangerously incomplete. The same capabilities that make Claude Mythos alarming as an offensive tool are precisely what Anthropic is deploying for defence through Project Glasswing.
Anthropic's own description of the initiative is striking in its clarity: "Although the risks from AI-augmented cyberattacks are serious, there is reason for optimism: the same capabilities that make AI models dangerous in the wrong hands make them invaluable for finding and fixing flaws in important software."
This dual-use reality is not a contradiction. It is the defining characteristic of every powerful security technology. Penetration testing tools are used by attackers and defenders. Vulnerability scanners work for both sides. The question has never been whether a technology can be misused — it always can. The question is whether the defensive applications of the technology are being adopted as aggressively as the offensive ones.
The answer, in most enterprise security operations, is no. Security teams are reading about Claude Mythos as a threat and updating their threat models. Very few are simultaneously asking: how do we put this capability to work for detection and response?
For email security specifically, this asymmetry is most visible at the detection layer. Attackers are already using LLMs to generate hyper-personalised phishing emails that defeat grammatical analysis, personalisation detection, and baseline anomaly models. A CBS News report from the week of the Glasswing announcement quoted a university CIO directly on this: AI is being used to "script those dialogues, those conversations, those phishing emails, to specific people — and really customise them to make them a lot more difficult to detect and identify."
The myth that LLMs are primarily a threat is dangerous because it leads security teams to focus entirely on defending against AI-generated attacks while ignoring the defensive capability those same models provide. The asymmetry only compounds over time: attackers who adopted LLMs early are now operating with Claude Mythos-class capabilities in underground markets while defenders are still reading threat advisories.
The question is not whether Claude Mythos is dangerous. It clearly is. The question is whether your defensive capabilities are adopting LLM-native reasoning at the same rate that offensive actors already have.
Project Glasswing exists precisely because Anthropic recognised this dynamic. The 52 organisations in the consortium are not there to be warned about AI. They are there to use it — to find and fix vulnerabilities in their foundational systems before threat actors reach equivalent capability. The defensive posture is not awareness. It is deployment.
The Claude Mythos coverage has focused almost entirely on vulnerability discovery and exploit generation. The benchmark results — 93.9% on SWE-bench Verified, 73% success on expert-level Capture the Flag challenges, zero-days in every major OS and browser — are genuinely extraordinary, and they understandably dominate the coverage.
But this focus on code-level capabilities has created a blind spot. The most immediate, most widespread, and most financially damaging application of AI in cybersecurity attacks is not zero-day exploit generation. It is AI-generated phishing and Business Email Compromise.
Consider the baseline numbers. The FBI's 2024 Internet Crime Report documents that AI-assisted BEC rose 37 percent in a single year, contributing to over $2.7 billion in reported losses. IBM's 2025 Cost of a Data Breach Report found phishing is the most common initial breach vector, responsible for 16 percent of incidents at an average cost of $4.8 million. These are not emerging trends — they are the dominant attack methodology in enterprise environments today.
Now consider what LLMs add to that baseline. Claude and other frontier models can generate phishing emails that are grammatically flawless, contextually accurate, personalised to the recipient's role and organisation, and indistinguishable from legitimate business communications. The FBI’s 2024 IC3 report shows AI-assisted BEC already rising 37 percent year-over-year — and that was before models like Mythos existed. The emails are getting harder to question, not easier. That is the point.
The myth that email security is somehow separate from the AI capability conversation is partly a product of how the Claude Mythos coverage has been framed. Vulnerability discovery is dramatic and quantifiable — a 27-year-old bug in OpenBSD found overnight is a compelling story. An AI model generating 50,000 personalised phishing emails targeting finance controllers at mid-market companies is less dramatic but produces orders of magnitude more financial damage.
The connection is not hypothetical. Anthropic's own threat intelligence reporting from November 2025 documented the first verified instance of a cyberattack predominantly executed by AI agents: a Chinese state-sponsored group used autonomous AI to compromise approximately 30 global targets, with AI managing between 80 and 90 percent of tactical operations independently. That attack included email-based social engineering components executed by AI without human direction.
For enterprise security teams, the implication is straightforward: the AI capability race that Project Glasswing is responding to at the infrastructure layer has already been running for two years at the email layer. The attackers who will eventually weaponise Mythos-class vulnerability discovery are the same actors who have already been deploying LLMs for email-based initial access. They are not waiting for general availability.
Zero-day exploit generation makes headlines. AI-generated phishing makes money. Both are the same LLM capability deployed against different attack surfaces — and email is the surface attackers reached first.
One of the most persistent myths in enterprise security circles is that AI safety measures — the content filters, the refusal training, the red-teaming and alignment work that companies like Anthropic invest heavily in — provide meaningful protection against misuse. If Claude won't help generate malware, the thinking goes, then Claude cannot be weaponised.
Let’s take that logic apart, because it fails at multiple levels simultaneously.
Start with the most basic problem: Claude’s guardrails only apply to Claude. Jailbroken LLMs — models stripped of ethical training and sold specifically for offensive purposes — have existed since 2023. WormGPT, FraudGPT, and their successors are available on underground markets for as little as $50 per month. These models have no content policies, no refusal training, and no alignment constraints. They are designed specifically for phishing email generation, malware composition, and social engineering at scale.
The existence of these tools means that Claude's guardrails protect against casual misuse by people who ask Claude directly to help them attack organisations. They provide no protection whatsoever against the professional criminal ecosystem that has already built its own tools.
There’s a deeper problem though, and Claude Mythos itself is the proof. Anthropic did not decide to restrict Mythos because its safety training was insufficient. It restricted it because the underlying capabilities — the code reasoning, the vulnerability analysis, the exploit chaining — are dangerous regardless of whether the model is willing to use them for harm. Mythos reportedly found working exploits for critical vulnerabilities even when operated by non-security engineers at Anthropic who were running it for legitimate defensive purposes.
The AISI (UK AI Security Institute) evaluation of Mythos Preview found that the model successfully solved 73 percent of expert-level CTF challenges — including challenges that no prior LLM had been able to complete. The capability gap is not bridged by adding a layer of content filtering. It is a function of the underlying model's reasoning and coding ability.
And then there’s the admission buried in the Glasswing announcement that most coverage glossed over. Anthropic stated explicitly that Mythos capabilities emerged as unintended consequences of general improvements in reasoning. In the Project Glasswing announcement, Anthropic stated explicitly: "We did not explicitly train Mythos Preview to have these capabilities. Rather, they emerged as a downstream consequence of general improvements in code, reasoning, and autonomy."
This is a significant admission. It means that as models become more capable at legitimate tasks — coding, reasoning, problem-solving — they also become more capable at security-relevant tasks whether or not that was the design intent. Guardrails trained against specific harmful use cases cannot anticipate capabilities that emerge from general improvements in reasoning.
For enterprise security teams, the practical implication is this: the assumption that AI safety measures create a meaningful barrier between AI capabilities and threat actors is not supported by the evidence. The threat actors who are actively deploying AI against enterprise environments are not using Claude. They are using purpose-built offensive tools, jailbroken models, and increasingly, custom-trained models that have never had safety training to begin with.
Claude's content policies protect against unsophisticated misuse. They do not protect against professional threat actors who have been building their own tooling for two years. The gap between what public models refuse to do and what underground models actively do is the actual threat surface.
This myth is the most consequential for email security specifically — and the most thoroughly demolished by the Claude Mythos benchmarks.
The argument runs like this: large language models are sophisticated pattern-matching and text generation systems. They can write convincingly. They can summarise. They can generate code when given enough examples. But genuine reasoning — the kind of contextual, multi-step inference that a trained security analyst applies when evaluating whether an email is a social engineering attempt — is beyond them. Humans understand intent. Models recognise patterns.
Claude Mythos demolished this argument not through benchmark scores alone, but through what those scores represent in practice.
On SWE-bench Verified — a benchmark that tests a model's ability to solve real-world software engineering tasks from actual GitHub repositories — Mythos scored 93.9 percent. This is not a test of text generation or pattern recognition. It is a test of whether a model can read a description of a software problem, reason about the codebase, form a hypothesis about the root cause, and implement a correct solution. That is reasoning.
On USAMO 2026 — a proof-based mathematical olympiad evaluation requiring formal logical reasoning — Mythos scored 97.6 percent, compared to 42.3 percent for Claude Opus 4.6. Olympiad mathematics is not a text generation task. It requires constructing valid logical arguments from first principles. That is reasoning.
Most directly, Mythos found a 27-year-old vulnerability in OpenBSD by chaining together multiple individually insignificant signals into a coherent exploit — a capability that Nicholas Carlini, Anthropic's security research lead, described as: "It has the ability to chain together vulnerabilities. So what this means is you find two vulnerabilities, either of which doesn't really get you very much independently. But this model is able to create exploits out of three, four, or sometimes five vulnerabilities that in sequence give you some kind of very sophisticated end outcome."
Chaining together weak signals into a coherent threat assessment is precisely what a skilled email security analyst does when evaluating a suspicious message. Is the sender domain slightly off? Is the request unusual for this sender-recipient relationship? Does the urgency seem engineered? Does the action requested bypass normal approval workflows? None of these signals is individually conclusive. Together, they constitute a threat verdict.
This is the analytical structure that LLM-native intent detection applies to email — and the Claude Mythos benchmarks confirm that frontier LLMs perform this kind of multi-signal reasoning at a level that exceeds all but the most skilled human analysts.
The architecture distinction matters here. There is a fundamental difference between email security systems that use AI as a supplementary signal within a signature or behavioural anomaly framework, and systems that use LLMs as the primary reasoning engine to evaluate email intent. The former adds an AI layer on top of an architecture that was designed for a different threat model. The latter reasons about what an email is trying to accomplish — the same question a skilled analyst asks — using the same class of model that Mythos has shown is capable of expert-level reasoning at scale.
This is the architectural distinction that defines Gen 3 email security — and it is grounded in exactly the capabilities that Claude Mythos has now demonstrated publicly. LLMs do not just generate text. They reason. The question for enterprise security teams is whether their email security architecture is using that reasoning capability or not.
A model that finds 27-year-old zero-days by chaining five weak signals into a working exploit is the same class of reasoning engine that detects a phishing email by recognising that the sender, the request, the urgency, and the action together constitute a social engineering attempt. The capability is not different — only the application is.
There is a particular form of institutional caution that treats every emerging technology as not-yet-ready until it has been thoroughly documented, evaluated by vendors, incorporated into compliance frameworks, and approved by committee. For most enterprise technologies, this caution is appropriate. For AI in cybersecurity, in April 2026, it represents a dangerous miscalibration between the speed of the threat and the speed of the defence.
The production-readiness argument rests on several assumptions, all of which have been overtaken by events.
Start with the most obvious crack: AI cyberattacks are not theoretical. Anthropic's own threat intelligence reporting from November 2025 documented the first verified instance of a cyberattack predominantly executed by AI agents — a Chinese state-sponsored group managing between 80 and 90 percent of tactical operations through AI, with human operators providing only high-level direction. That attack preceded Claude Mythos. It was conducted with the capabilities available to threat actors six months ago.
The next argument is usually that diffusion is gradual — that organisations have time to prepare even if the tools exist. That window has already closed. The Security Magazine analysis of Glasswing quotes the BeyondTrust Security Team: they have "already observed AI-assisted tooling compress the exploitation window for critical vulnerabilities to minutes, not weeks." According to reporting on CrowdStrike’s 2026 Global Threat Report, 67 percent of vulnerabilities exploited by China-nexus groups provided immediate system access at the moment of exploitation — zero time between discovery and breach. The diffusion has already happened.
Some security teams take comfort in the fact that Mythos is restricted — if the most capable model sits behind a $100M consortium, the exposure must be bounded. That reasoning misunderstands what is already in circulation. Claude Mythos is not generally available. Glasswing partners pay $25 per million input tokens and $125 per million output tokens for restricted access. And yet the security community consensus, clearly articulated in the Security Magazine expert roundup, is that "the adversary already has AI working for them" and that Glasswing "should signal to leadership urgency, not reassurance." Open-weight models at current capability levels — freely available, no safety training — represent the production threat environment. Mythos Preview represents where that environment will be in months.
The last version of this argument applies specifically to email: that LLM-native detection is still maturing and should be evaluated before deployment. That ship has sailed. This assumption is demonstrably false and has been for some time. LLM-native email security that analyses message intent, contextual sender-recipient relationships, and social engineering patterns in real time is not a research prototype. It is deployed, operational, and producing verdicts on production email traffic. The organisations that adopted this architecture before April 2026 have been defending against AI-generated phishing with AI-native detection. The organisations still evaluating it are defending against those same attacks with Gen 1 or Gen 2 architectures that were not designed for them.
The practical consequence of the 'not yet ready' myth is a one-way ratchet: threat actors who adopted AI early are extending their lead over defenders who are still in evaluation mode. There is no catch-up mechanism built into a 'wait and see' posture. Every month of delay is a month during which AI-generated phishing campaigns run against email security architectures that were designed for a different class of attack.
Project Glasswing was created because Anthropic recognised that the window between a capability becoming available to responsible actors and the same capability becoming available to irresponsible actors is shrinking, not growing. The organisation's explicit goal is to use that window for defensive hardening before it closes. Enterprise email security teams face exactly the same window — and exactly the same urgency.
The production-readiness question was answered in September 2025 when AI orchestrated its first documented cyberattack. The question for enterprise security teams in April 2026 is not whether AI-powered threats are production-ready. It is whether AI-powered defences are being deployed at the same pace.
The five myths addressed in this post share a common thread: they each create a reason to delay adopting AI-native detection while threat actors who have not been waiting continue to advance. The cumulative effect is a detection gap that widens with every month those myths remain unchallenged.
Claude Mythos and Project Glasswing have made three things permanently clear:
The architecture question this creates for enterprise security teams is straightforward, if uncomfortable: does your email security system reason about intent, or does it match patterns? The answer determines whether it can detect AI-generated attacks that produce no pattern to match — the class of attack that is now the operational standard for sophisticated threat actors.
This is the distinction that defines the difference between email security architectures that are built for the threat environment of 2019 and those built for the threat environment of 2026. It is also the distinction that will determine which organisations join the growing list of AI-assisted breach victims and which do not.
Claude Mythos is a watershed moment — not because it created new risks, but because it made existing ones undeniable. A 27-year-old vulnerability in OpenBSD. A 16-year-old bug in FFmpeg. Thousands of zero-days across every major operating system and browser. Found autonomously, overnight, by a model that Anthropic explicitly did not design to have these capabilities.
The myths examined in this post — that LLMs are primarily threats, that email security is somehow separate from AI capability advances, that guardrails provide meaningful protection, that LLMs cannot reason, that production deployment can wait — are all understandable responses to a technology that has advanced faster than most organisations' ability to evaluate it.
But understandable is not the same as safe. The adversary already has AI working for them. The question is whether the defence does too.
At StrongestLayer, we built our detection architecture on LLM-native intent reasoning because we understood — before Mythos made it impossible to ignore — that the only system capable of detecting AI-generated attacks that produce no pattern is a system that reasons about what those attacks are trying to accomplish. Not what they contain. What they intend.
Project Glasswing is the right response to Mythos. The right response to AI-generated phishing is the same architecture, applied to the email layer — where the attacks are already happening, at scale, today.
Anthropic’s most capable model to date, announced April 7, 2026 — and deliberately kept from public release. That last part is the real story. In the weeks before the announcement, Mythos found thousands of zero-day vulnerabilities across every major OS and browser autonomously, including a 27-year-old flaw in OpenBSD that had survived decades of human and automated scrutiny, and a 16-year-old bug in FFmpeg that outlasted five million automated test runs without detection. Anthropic concluded the capability was too dangerous to release widely. So instead they launched Project Glasswing — a restricted consortium of 52 organisations using Mythos to harden critical software before equivalent capability reaches threat actors. For cybersecurity, what Mythos confirms is that frontier AI has crossed the threshold where it surpasses the best human security researchers at finding and chaining vulnerabilities. That changes the calculus for defenders and attackers in equal measure.
A $100 million defensive cybersecurity initiative Anthropic launched on April 7, 2026, built around restricted access to Claude Mythos Preview. Fifty-two organisations — AWS, Apple, Microsoft, Google, Cisco, NVIDIA, JPMorganChase, the Linux Foundation, and others — are using Mythos to find and fix vulnerabilities in the critical software infrastructure the internet runs on, before similar capability reaches threat actors. Anthropic put $100 million in usage credits and $4 million in direct donations to open-source security organisations behind it. The reasoning is simple and urgent: the window between a powerful capability being available to responsible actors and the same capability reaching irresponsible ones is shrinking. That window needs to be used for hardening, not for waiting to see what happens.
More directly than most coverage suggests. The benchmarks everyone focuses on — zero-days, exploit chaining, CTF performance — all demonstrate the same underlying capability: multi-step reasoning under uncertainty, combining weak signals into confident conclusions. That is exactly what email security needs. A phishing email from a first-contact sender with a clean domain, personalised content, and no malicious payload produces zero signal for signature or anomaly detection. But it has intent. LLM-native detection reasons about that intent the way Mythos reasons about a codebase — not by matching known patterns, but by understanding what the communication is trying to accomplish. The capability is the same. Only the application differs.
Not directly — Claude’s safety training is specifically built to refuse those requests. But that answer misses the actual question. Sophisticated threat actors do not ask Claude to write their phishing emails. They use WormGPT, FraudGPT, and a growing list of purpose-built offensive LLMs with zero safety training, available on underground markets. These tools exist because demand for AI-generated social engineering is real, growing, and profitable. Claude’s refusal policies protect against someone naively asking the wrong question. They say nothing about what is already running in production against your employees’ inboxes right now.
It means the detection engine asks a different question. Signature systems ask: does this email match something we’ve seen before? Behavioural systems ask: does this sender behave differently than usual? Both questions are useless against a first-contact attacker using a clean domain and an AI-written message that has never been sent before. Intent reasoning asks: what is this email actually trying to make the recipient do, and is that consistent with why people normally send emails like this? That question works regardless of whether the sender is known, whether the domain is flagged, or whether the payload matches any prior campaign. It is the question a skilled analyst asks. LLM-native detection asks it at machine speed, across every message, without fatigue.
Yes — and the ‘still maturing’ framing was already outdated before April 2026. The first documented AI-orchestrated cyberattack happened in September 2025. AI-generated phishing is driving billions in annual BEC losses right now, today. The question of production-readiness was settled by the threat environment, not by any vendor announcement. The only question that actually matters for your organisation is not whether LLM-native email security is ready for production. It is whether what you are currently running can detect an AI-crafted attack that has no malicious payload, no known sender history, and no pattern to match against anything in your existing rules. If the answer is no, the readiness problem is yours, not the technology’s.
Anthropic: Project Glasswing — Securing Critical Software for the AI Era (April 7, 2026)
SecurityWeek: Anthropic Unveils Claude Mythos — A Cybersecurity Breakthrough (April 2026)
UK AI Security Institute: Our Evaluation of Claude Mythos Preview's Cyber Capabilities (April 2026)
CBS News: Anthropic's Mythos AI Can Spot Weaknesses in Almost Every Computer on Earth (April 2026)
FBI Internet Crime Complaint Center: 2024 Internet Crime Report
Be the first to get exclusive offers and the latest news
Deploy in minutes, not months. Zero tuning. See what your current tools are missing.