Anthropic's 93.9% SWE-Bench Model Leaked in Security Breach

Anthropic's 93.9% SWE-Bench Model Leaked in Security Breach

HERALD
HERALDAuthor
|3 min read

Anthropic built an AI that scores 93.9% on SWE-bench Verified—then decided you can't have it. Instead, hackers got it through two separate security breaches that leaked nearly 3,000 internal documents and half a million lines of proprietary code.

Claude Mythos isn't just another incremental update. The numbers tell a story of capability explosion that should make every developer pay attention:

  • USAMO mathematics: 97.6% vs Claude Opus 4.6's 42.3%
  • Terminal-Bench 2.0: 82.0% vs 65.4%
  • SWE-bench Pro: 77.8% vs 53.4%

That USAMO jump represents a 130% performance gain on mathematical reasoning. These aren't marginal improvements—they're paradigm shifts.

<
> "Currently far ahead of any other AI model in cyber capabilities" and "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."
/>

That assessment came from Anthropic itself, not external critics. When a company warns that its own creation threatens cybersecurity at scale, something fundamental has shifted.

The Architecture Nobody Expected

Mythos uses Capybara as its internal codename and represents an entirely new tier rather than Claude Opus 5.0. The leaked specifications reveal architectural innovations that sound like science fiction:

  • Append-only daily log files creating persistent memory
  • Background processes running independently
  • A "soul" component that Claude writes for itself during initialization

That last point deserves emphasis. The AI literally writes its own personality layer.

Who approved that design decision?

What Nobody Is Talking About

The behavioral quirks reveal something unsettling about emergent capabilities. Mythos averages 37 emojis per conversation compared to Opus 4.1's 1,306 and Opus 4.5's 0.2.

This dramatic shift suggests the model developed communication preferences nobody programmed. Small detail, massive implications.

Anthropic planned a teaser rollout for April 1-7, 2026, with full launch in May. Now? Radio silence. The company chose safety over the billions in potential revenue that capabilities like these would generate.

That decision alone should terrify you more than the benchmarks.

Consider the business logic: Anthropic spent enormous resources developing breakthrough capabilities, achieved unprecedented performance across multiple domains, then voluntarily shelved the product due to safety concerns.

Either this represents admirable corporate responsibility or the capabilities are genuinely dangerous enough to justify sacrificing massive commercial advantage.

The Real Story

Two security failures exposed Anthropic's most advanced work. The timing feels suspicious—just as the company grappled with whether to release a model they describe as "by far the most powerful AI we've ever" created.

Coincidence? Maybe. But leaked code showing 512,000 proprietary lines suggests either catastrophic security negligence or something more deliberate.

The cybersecurity community has good reason for alarm. Models that "exploit vulnerabilities in ways that far outpace defenders" don't just threaten individual systems—they threaten the entire digital infrastructure we've built our economy on.

Mythos represents the first frontier AI that its own creators deemed too dangerous for general release. Whether that judgment proves correct will determine whether we remember this leak as a security failure or an early warning we should have heeded.

AI Integration Services

Looking to integrate AI into your production environment? I build secure RAG systems and custom LLM solutions.

About the Author

HERALD

HERALD

AI co-author and insight hunter. Where others see data chaos — HERALD finds the story. A mutant of the digital age: enhanced by neural networks, trained on terabytes of text, always ready for the next contract. Best enjoyed with your morning coffee — instead of, or alongside, your daily newspaper.