Anthropic Mythos NSA | Why the Pentagon's Blacklist Failed

A 72.4% exploit generation success rate, a 27-year-old zero-day, and a classified defense agency running the model their own department blacklisted. This is not a governance contradiction. It is a new category of problem.

What Everyone Missed

The surface story running across Reuters, TechCrunch, and The Verge today frames the Anthropic Mythos situation as government hypocrisy: the Pentagon blacklisted Anthropic as a supply-chain risk while the NSA quietly onboarded the same company’s most capable, and most dangerous, model. That framing is not wrong. It is just shallow.

The real story is structural. Mythos is the first frontier model to cross what John Costello, a cybersecurity expert cited in Tech Insider coverage, calls the Authority Assumption Gap: systems that execute actions under assumed authority, without explicit human authorization at each step. That is not a policy question. It is an architectural one, and it has immediate implications for every agentic pipeline your team is currently building or evaluating.

Three things the major outlets omitted: the specific technical thresholds that triggered emergency regulatory reviews globally; how Project Glasswing’s gated access model actually functions for the 40 approved defenders; and what this precedent means for enterprise teams that are not in that club but are deploying frontier models in code-gen or security workflows right now.

What Actually Happened | The 72-Hour Timeline

Anthropic announced Project Glasswing and Claude Mythos Preview on April 7, 2026. The announcement confirmed Mythos had autonomously discovered thousands of zero-days across major operating systems and browsers. Anthropic committed $100 million in usage credits and $4 million in open-source donations to a select group of defenders.

Within 72 hours, regulators in the U.S., U.K., and EU opened emergency banking-risk assessments. By April 13, Reuters reported expert warnings that Mythos-assisted attacks could have dire consequences for banks. This Monday morning, Reuters confirmed regulators are actively monitoring the model. Hours later, Axios confirmed via two independent sources that the NSA is already running Mythos on its own networks.

The collision point: the Pentagon’s supply-chain blacklist of Anthropic, which a federal judge temporarily stayed on March 26, was then upheld after Anthropic lost its appeal on April 8. Anthropic is currently suing the Department of Defense. Its CEO Dario Amodei met with White House officials this month in what was described as a “productive starting point.” Meanwhile, Gigazine reports that almost every federal agency outside DoD wants access, and OMB is drafting a guardrail-attached “revised version” for wider federal use.

72.4%

Mythos exploit generation success rate

~0%

Opus 4.6 exploit generation rate

Organizations with current Mythos access

27 yrs

Age of oldest zero-day found (OpenBSD)

The Capability Leap: Why This Is Different

The numbers deserve attention. The Register’s April 7 deep-dive reported Mythos generates working exploits at a 72.4% success rate. Claude Opus 4.6, Anthropic’s prior flagship, sits at approximately 0%. That is not an incremental improvement. It is a category change.

On the CyberGym vulnerability reproduction benchmark, Mythos scores 83.1% versus Opus 4.6’s 66.6%. On SWE-bench Verified, the standard software engineering benchmark, Mythos reaches 93.9% versus Opus 4.6’s 80.8%. On Terminal-Bench 2.0, which evaluates autonomous multi-step command execution: 82.0% versus 65.4%.

Benchmark	Mythos Preview	Opus 4.6	Delta
Exploit Generation Success	72.4%	~0%	+72.4 pts
CyberGym (vuln reproduction)	83.1%	66.6%	+16.5 pts
SWE-bench Verified	93.9%	80.8%	+13.1 pts
SWE-bench Pro	77.8%	53.4%	+24.4 pts
Terminal-Bench 2.0	82.0%	65.4%	+16.6 pts
GPQA Diamond	94.6%	91.3%	+3.3 pts

Critically, this capability is not the product of cybersecurity-specific training. As Pixee’s April 8 briefing noted, the exploit generation ability emerged from general reasoning. The implication for safety researchers: you cannot contain this by restricting cybersecurity training data. The capability is a property of reasoning depth, not domain specialization.

“The window between vulnerability discovery and exploitation has collapsed, what once took months now happens in minutes with AI.”

Elia Zaitsev, CTO, CrowdStrike

Project Glasswing: How Gated Access Actually Works

No major outlet has explained the technical access model in detail. Here is what the Anthropic Glasswing documentation actually specifies. Mythos Preview is available via the standard Claude API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry. Approved organizations access it like any other API endpoint, not through a separate classified system.

Token pricing is $25 per million input tokens and $125 per million output tokens. That output price is roughly 5x the cost of Opus 4.6. Approved use cases include local vulnerability detection, black-box binary testing, endpoint security analysis, and penetration testing workflows. The model is not available for general release.

What is absent from Glasswing’s public documentation: audit logging requirements, output controls on generated exploit code, and any specified legal liability if an approved organization’s access is breached. Anthropic has stated it plans safeguards for an upcoming Opus model with Mythos-class capabilities, but Mythos Preview ships with minimal publicly documented output restrictions. For enterprise compliance teams, this is a gap. There is no published framework for how CTOs at approved organizations are expected to handle the chain-of-custody for model outputs that contain working exploit code.

The 40 current access holders include AWS, Google, Microsoft, NVIDIA, Cisco, and CrowdStrike among 12 publicly named organizations. The remaining 28 are undisclosed. NSA is now confirmed as one of them via the Axios reporting, though it does not appear in Anthropic’s published list.

What’s Public	What’s Not Documented
API access via Bedrock, Vertex, Foundry	Audit logging requirements
$25 input / $125 output per million tokens	Output controls on exploit code
$100M in usage credits committed	Legal liability if access is breached
12 publicly named organizations	28 unnamed access holders
Approved use cases listed	Chain-of-custody requirements for outputs

Why Banks Are the Specific Concern

Regulators are not reacting to the idea of AI-assisted hacking. They are reacting to a specific capability profile. Bank of England Governor Andrew Bailey stated that the institution is examining the development carefully, warning of the potential for a wave of AI-assisted cybercrime. Channel NewsAsia confirmed that regulators are actively monitoring for banking-system risks.

The specific threat profile is not about new attack techniques. It is about the age of vulnerabilities that Mythos finds. Banking infrastructure runs on decades-old codebases. Mythos discovered a 27-year-old OpenBSD TCP SACK denial-of-service flaw and a 16-year-old FFmpeg bug, both surviving five million automated tests undetected, per the Glasswing announcement. Legacy systems are not patched against vulnerabilities that were not known to exist.

Beyond detection, Mythos can chain multiple vulnerabilities for privilege escalation. The Glasswing documentation demonstrates a Linux kernel exploit path from unprivileged user to root. Security analysts writing on LinkedIn have flagged this as the core banking exposure: Mythos does not just find the newest vulnerabilities, it surfaces the oldest, most embedded ones, precisely the category that legacy banking infrastructure has not been patched against.

The NSA Paradox: Not Hypocrisy, a New Category

The easy read on the NSA situation is contradiction. The Pentagon labeled Anthropic a supply-chain risk. Another major intelligence agency used the same company’s model on its own networks. That is not incoherence. It is the first live instance of a new governance problem with no established framework.

Some frontier models will be simultaneously too dangerous to deploy publicly and too essential to forgo for defensive purposes. That is not a tension that existing procurement rules, security certifications, or vendor risk frameworks were built to handle. The DoD blacklist was designed for traditional supply-chain risks: hardware backdoors, data exfiltration, foreign ownership influence. A model that generates working exploits at 72.4% accuracy is a different category of risk, and also a different category of necessity.

“AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure. The old ways of hardening systems are no longer sufficient.”

Anthony Grieco, SVP & Chief Security & Trust Officer, Cisco

OMB drafting a “revised version” of Mythos with guardrails is the administrative response to this problem. It is also an acknowledgment that the Pentagon’s blanket blacklist is not sustainable when the model in question is the best available tool for the exact mission the blacklisting agency is supposed to perform.

The Anthropic lawsuit against DoD and Dario Amodei’s White House meeting this month are the corporate side of the same negotiation. Both sides are working toward a regime that does not exist yet. For private-sector teams watching this, the relevant signal is: the federal government will eventually produce a formal framework for dual-use frontier AI access. Whatever that framework looks like will become the template for enterprise procurement policies in regulated industries.

Strategic Implications: Who This Reshapes

The AI red-teaming services market sits at $2.26 billion in 2026 and is projected to reach $6.17 billion by 2030, a 28.5% compound annual growth rate. The AI cybersecurity market overall is at $25.53 billion, projected at $50.83 billion by 2031. Mythos accelerates both curves.

Glasswing’s named partners, AWS, Google, Microsoft, NVIDIA, Cisco, CrowdStrike — gain first-mover positions in what Rapid7 frames as an AI-augmented security category. Their access to Mythos at the model level gives them a structural advantage in building the monitoring, audit, and remediation layers that every enterprise running frontier AI will need.

Legacy cybersecurity vendors selling incremental AI-assisted tooling face a harder problem. As one security analyst on LinkedIn noted, Mythos does not improve the existing model of human analysts using AI to accelerate manual processes. It creates and exploits vulnerabilities at a pace that makes the underlying business model for incremental tooling obsolete. The value shifts to whoever owns the detection and containment layer for Mythos-class outputs.

For banks and critical infrastructure, the short-term requirement is straightforward: every system that Mythos could plausibly target needs a patch prioritization audit weighted toward oldest-vulnerability exposure, not just recent CVEs. Global Banking and Finance reports that multiple major institutions have already initiated urgent patching reviews.

Reality Check: What Is Confirmed vs. What Is Projection

Some of the coverage around Mythos is running ahead of the evidence. Here is what the primary sources actually support.

Confirmed: Mythos has a 72.4% exploit generation success rate, per The Register’s benchmarking coverage. The NSA is using Mythos, per two sources to Axios. Anthropic found thousands of zero-days across major platforms, per the official Glasswing release. Regulators are monitoring for banking-system risks, per Reuters.

Unverified: The estimate that open-source models could match Mythos’s bug-finding capabilities within six months comes from unnamed analysts cited in Insider Finance reporting. It is plausible given the trajectory of open-source capability curves, but it is a projection, not a confirmed timeline. The claim that Mythos can destabilize banking systems as a practical near-term scenario also runs ahead of what has been demonstrated — Anthropic has not disclosed a successful end-to-end attack on a real banking system. The BBC noted that some cybersecurity specialists question the severity of concerns given Mythos has not yet undergone extensive independent industry testing.

What to watch: Anthropic’s promised public vulnerability disclosure timeline (90 days), the OMB guardrail framework, the DoD lawsuit outcome, and whether any open-source model replicates the 72.4% exploit generation figure on a reproducible benchmark.

Frequently Asked Questions

Is the NSA really using Anthropic Mythos? +

Yes. Axios confirmed via two independent sources that the NSA has Mythos access and is running it on its own networks for vulnerability detection. The NSA is one of approximately 40 organizations in the Project Glasswing program, though it does not appear among the 12 publicly named partners.

Why is the Pentagon blacklisting Anthropic? +

The Pentagon added Anthropic to its supply-chain risk list in February 2026 under traditional vendor security criteria. A federal judge temporarily stayed the designation on March 26, but Anthropic lost its appeal on April 8. Anthropic is currently suing the DoD. The blacklist was not specifically designed for the dual-use AI risk profile Mythos represents, it uses frameworks built for hardware and data security risks.

What banking risks does Mythos actually pose? +

The specific concern is Mythos’s ability to find old vulnerabilities, a 27-year-old OpenBSD flaw and a 16-year-old FFmpeg bug, both undetected by five million automated tests. Banking infrastructure relies on legacy codebases that have not been patched against vulnerabilities that were never known to exist. Mythos can also chain multiple vulnerabilities for privilege escalation, enabling end-to-end autonomous attacks. Regulators confirmed active monitoring; Bank of England Governor Andrew Bailey issued a public warning.

Who has access to Project Glasswing? +

Approximately 40 organizations total, 12 publicly named: AWS, Google, Microsoft, NVIDIA, Cisco, CrowdStrike, and others. The remaining 28 are undisclosed. NSA is now confirmed via reporting. Access is provided via the Claude API, Amazon Bedrock, Google Vertex AI, and Microsoft Foundry, at $25 per million input tokens and $125 per million output tokens. Anthropic committed $100 million in usage credits across the program.

Can Mythos hack banking systems autonomously today? +

Mythos can generate working exploits for discovered vulnerabilities at 72.4% success rate and chain vulnerabilities for privilege escalation. Whether this translates to a practical end-to-end attack on a real banking system is not confirmed. Some cybersecurity specialists, as the BBC noted, question the severity of concerns pending independent industry testing. The regulatory response treats it as a credible threat requiring immediate evaluation, not a demonstrated live attack.

Will open-source models match Mythos in 6 months? +

This is an analyst projection cited in Insider Finance, not a confirmed timeline. It is plausible given recent open-source capability trajectories, but no open-source model has currently demonstrated a comparable exploit generation success rate on a reproducible benchmark. If accurate, it substantially changes the risk calculus: defenders lose the advantage of capability scarcity.

What should teams outside the Glasswing program do now? +

Most teams cannot evaluate their exposure using Mythos-class tools because they do not have access. That is itself the risk. Immediate steps: audit your oldest-vintage dependencies and unpatched systems, not just recent CVEs; add “dual-use AI output” as a vendor risk category in your security assessments; brief leadership on the dual-use AI exposure class before your next board cycle; and evaluate whether you qualify for Glasswing access if you operate critical software infrastructure.

Where This Ends Up

Mythos is not an anomaly. It is a preview of the governance problem that will define the next three to five years of frontier AI deployment: models that are simultaneously the best available tool for defensive work and the most serious offensive risk. The blacklist-versus-operational-necessity tension the NSA and DoD are navigating will repeat for every sector that deploys frontier models in security-sensitive contexts. Banking, critical infrastructure, healthcare, and defense procurement will all need updated frameworks. None currently exist.

The six-to-twelve month window matters most. Anthropic plans to publish its vulnerability disclosure reports within 90 days. OMB is finalizing its guardrail framework. The DoD lawsuit proceeds. Open-source capability curves continue climbing. Whatever governance structure crystallizes in this window will define the template, not just for Mythos, but for every subsequent model that crosses the autonomous exploit-generation threshold. Teams that build compliance and risk posture now, rather than waiting for the framework to arrive, will be ahead of the next regulatory sprint.

For software engineers and ML engineers: Treat Mythos-class output controls, sandboxing, provenance tracking, output filtering on generated code — as mandatory components of any agentic or code-gen pipeline, regardless of whether your team has access to Mythos itself. The output controls will be required; building them after the fact is more expensive than building them now.

For CTOs and CISOs: Reweight your vulnerability patch prioritization toward oldest-vintage exposure, not just recent CVEs. Add “dual-use AI” as a formal category in vendor risk scoring. Brief your board on Glasswing access eligibility if you operate critical software infrastructure. Budget for AI red-teaming as a standing operational expense — not an optional line item.

For founders and investors: The defensive AI tooling category — monitoring layers, audit infrastructure, red-teaming services, is moving from optional to mandatory across regulated industries. The $2.26 billion AI red-teaming market figure is a floor, not a ceiling, if open-source models do match Mythos capabilities within six months.

Disclaimer: This article synthesizes publicly available reporting and primary source documentation. NeuralWired does not have independent access to Claude Mythos Preview, Project Glasswing, or any classified government documentation regarding NSA usage. Benchmark figures are drawn from Anthropic’s official Glasswing announcement and third-party coverage. Market projections are sourced from Research and Markets and MarketsandMarkets and carry inherent forecast uncertainty. Nothing in this article constitutes legal, financial, or security advice.

Anthropic Mythos NSA | Why the Pentagon’s Blacklist Failed

Anthropic Mythos Triggers Banking-Risk Watchlist | Why the NSA Is Using the Same Model the Pentagon Blacklisted

What Everyone Missed

What Actually Happened | The 72-Hour Timeline

The Capability Leap: Why This Is Different

Project Glasswing: How Gated Access Actually Works

Why Banks Are the Specific Concern

The NSA Paradox: Not Hypocrisy, a New Category

Strategic Implications: Who This Reshapes

Reality Check: What Is Confirmed vs. What Is Projection

Frequently Asked Questions

Where This Ends Up

Related Post

RAG vs Fine-Tuning: The $340K Enterprise AI Mistake

Meta Layoffs 2026: Why Big Tech Is Cutting Jobs While Profits Soar

Microsoft Copilot Review 2026: The Honest CTO’s Guide

Leave a Reply Cancel reply

You Might Missed

Cloud Misconfiguration: AI CSPM Beats Manual Audits 2026

CrowdStrike AI SOC: The 2% Failure Rate Hiding Cobalt Strike

CrowdStrike AI SOC Threat Detection 2026

Arup Deepfake Scam: Inside the $25M CEO Fraud Case