Anthropic’s New AI Model Raises Alarm Over Cyber Risk

Anthropic’s decision to keep its newest artificial intelligence model out of public hands has set off concern well beyond Silicon Valley, drawing some of the country’s top financial officials and bank executives into an unusual conversation about whether a powerful new class of AI systems could reshape cyber risk faster than institutions can defend against it.

The company said this week that its latest model, Claude Mythos Preview, had already identified thousands of previously unknown software vulnerabilities, including flaws affecting widely used infrastructure. Rather than release the system broadly, Anthropic said it would make the model available only to a limited set of security researchers and corporate partners through a gated program it calls Project Glasswing.

The move is notable not only because AI companies typically showcase their newest models as widely as possible, but because Anthropic is effectively arguing that this one is too capable in one domain — cyber offense as well as defense — to be distributed in the usual way.

That claim appears to have been taken seriously in Washington. Reports this week said Treasury Secretary Scott Bessent convened major U.S. bank chiefs in Washington to discuss the cyber implications of Anthropic’s new model, with Jerome Powell, the Federal Reserve chair, also attending. The involvement of the Treasury and the Fed suggested that what might once have been seen as a niche technology release is now being treated as a question of financial stability and critical infrastructure resilience.

A Model Held Back

Anthropic described Mythos as a frontier model whose cybersecurity ability emerged not from narrow exploit-specific training, but from broader advances in coding, reasoning and autonomy. In the company’s account, the model can do more than spot bugs. It can reproduce them, suggest fixes and, in some cases, turn vulnerabilities into working exploits.

In a technical write-up, Anthropic said the model had successfully assembled complex attack chains, including browser exploits involving multiple vulnerabilities, Linux privilege-escalation paths and a remote-code-execution exploit against FreeBSD’s NFS server. The company also said the model had found serious flaws in major operating systems and web browsers.

That level of capability, if borne out independently, would mark a sharp break from the recent past, when many security teams complained that AI-generated bug reports were often noisy, inaccurate and more burdensome than useful. Over the past several weeks, however, prominent open-source maintainers and security researchers have begun describing a striking shift: fewer frivolous reports, more real ones, and a flood of findings substantial enough to consume hours of human review.

Anthropic’s own comparisons underscored the leap. The company said an earlier high-end model in its Claude line had a near-zero success rate on autonomous exploit development in internal tests. Mythos, by contrast, performed dramatically better on at least some benchmarks, including attempts to turn browser-engine vulnerabilities into functioning exploits.

Why Banks Are Paying Attention

For banks, the concern is not abstract. Large financial institutions already contend with relentless probing from criminal groups and nation-state hackers, and they depend on sprawling stacks of software, vendors and legacy systems. A technology that significantly lowers the cost of finding and weaponizing weaknesses could compress the time between discovery and attack — and broaden the pool of actors capable of sophisticated intrusion.

That is why Anthropic’s announcement landed as more than another AI product story. If frontier models can automate parts of vulnerability research and exploit development, they may alter the balance between attackers and defenders in sectors where outages or breaches can have cascading effects. Banks sit near the top of that list, alongside cloud providers, telecom firms, power systems and major software platforms.

Anthropic itself has framed the danger in similar terms, arguing that near-term deployment of highly cyber-capable models could benefit attackers unless access is carefully controlled and defenses are improved first. Project Glasswing is the company’s attempt to buy time: trusted partners get access to identify and patch vulnerabilities in foundational systems before comparable capabilities become widespread.

The company said it would provide $100 million in usage credits to support the effort, alongside partnerships with large technology firms and security organizations. The idea is to direct the model first toward hardening the infrastructure on which much of the digital economy depends.

Evidence of a Broader Shift

Outside experts have treated the restricted rollout as plausible, if not yet fully verified. Some of the examples cited by Anthropic align with real-world patches and disclosures. An OpenBSD fix issued in late March addressed a kernel crash involving malformed TCP SACK packets; Anthropic researcher Nicholas Carlini has publicly described finding a decades-old OpenBSD bug with the help of the model. Other reported findings appear consistent with recent patches in Linux-related code.

Even so, important questions remain unanswered. Anthropic’s central claims — thousands of zero-days, unusually strong autonomous exploit capability, and a risk profile severe enough to justify withholding the model — have not yet been comprehensively validated by independent third parties. Rival AI companies have not publicly disclosed whether they possess systems of comparable cyber ability, though many in the industry assume that if one lab has reached this threshold, others may not be far behind.

That uncertainty is part of what makes the moment so consequential. The issue is not simply whether Mythos is exactly as capable as Anthropic says. It is whether the trajectory is now clear enough that institutions must prepare for a world in which advanced AI systems make bug-finding and exploit development far cheaper, faster and more scalable.

A New Release Pattern?

Anthropic’s approach may also preview a new model for deploying the most sensitive AI systems. In recent years, companies have used staged rollouts mainly to manage reputational risk, misinformation concerns or competitive strategy. Project Glasswing suggests a different logic: that some frontier models may be too operationally dangerous in specific domains to release openly until safeguards, counterpart defenses and monitoring systems catch up.

Anthropic has said it does not plan to make Mythos generally available in its current form. Its longer-term goal, the company says, is to develop safeguards that can detect and block the model’s most dangerous cyber outputs and then apply those protections to future systems.

Whether that becomes an industry norm remains to be seen. Regulators have so far relied more on briefings and warnings than on formal rules. But the fact that top economic officials are already discussing an AI model with bank chiefs suggests the policy conversation is moving quickly. The release of a chatbot once belonged largely to the realm of consumer technology. The release of a model that can help uncover and chain together serious software flaws is now being treated as something closer to a national risk-management problem.

For years, cybersecurity specialists have warned that AI would eventually change the economics of offense and defense. What is new is the sense, now spreading from software maintainers to executive suites and government agencies, that the shift may no longer be theoretical.

Sources

Further reading and reporting used to add context:

AI News