OpenAI Built a $10M Hacker Training Academy for Smart Contracts

HERALDAuthor

February 23, 2026|3 min read

I remember the first time I watched an AI agent successfully drain a smart contract. It was eerily methodical—no frustration, no celebration, just pure algorithmic precision moving millions in digital assets.

That's exactly what OpenAI's new EVMbench enables, and they're calling it "defensive research."

Launched February 18, 2026, in partnership with crypto VC giant Paradigm, EVMbench is essentially a hacker bootcamp for AI agents. It features 120 curated vulnerabilities sourced from 40 real-world security audits, including Paradigm and Stripe's Tempo blockchain review.

Here's what makes this fascinating:

Three attack modes: vulnerability detection, patching, and full exploitation
70% exploit success rate on OpenAI's latest coding model
Real vulnerabilities from actual security audits, not synthetic examples
$10 million in API credits allocated for "defensive" research

The timing feels deliberate. Just weeks before EVMbench's launch, we saw the Moonwell exploit (AI-assisted vulnerable code) and CrossCurve's $3 million loss. OpenAI also quietly acquired OpenClaw, an autonomous AI agent company, days before this announcement.

<
> Wu Blockchain calls EVMbench an "on-chain survival exam for Agents," noting that "future Agents may no longer belong to anyone."
/>

That quote should terrify you.

The Technical Sophistication Is Alarming

EVMbench runs on a Rust-based testing harness that deploys contracts in isolated local Ethereum environments. It grades performance through on-chain events, balance changes, and custom scripts—ensuring no real funds are involved during training.

But here's the concerning part: these aren't toy problems. The benchmark uses vulnerabilities from actual security audits where real money was lost. The AI agents craft transactions, deploy helper contracts, and execute end-to-end attacks that update blockchain state.

They're learning to hack like seasoned professionals.

The Dual-Use Problem Nobody Wants to Discuss

OpenAI emphasizes "responsible release" and "safeguards," but let's be honest about what they've built. An AI that scores 70% on exploiting smart contract vulnerabilities isn't just a defensive tool—it's an autonomous hacking framework.

Their solution? Expanding the private beta of Aardvark, their security research agent, and throwing money at the problem via their Cybersecurity Grant Program.

Classic move: build the weapon, then sell the shield.

Anthropic warned us in late 2025 that AI agents could independently identify smart contract vulnerabilities, potentially lowering exploit costs. OpenAI just proved them right and open-sourced the toolkit on GitHub.

What This Really Means

Paradigm's involvement signals something bigger than a research project. The crypto VC firm pivoted to AI investments last year, and this partnership positions them at the intersection of two explosive technologies.

With trillions in value locked in DeFi protocols, EVMbench could either:

1. Accelerate security improvements through AI-assisted auditing

2. Lower the barrier to sophisticated smart contract exploits

3. Both simultaneously

The benchmark supports all EVM-compatible chains and provides measurable standards for AI in blockchain security. But early results show AI agents excel more at exploiting than patching—a concerning imbalance.

My Bet: Within 18 months, we'll see the first major DeFi exploit traced back to techniques pioneered in EVMbench. OpenAI's "defensive research" will inevitably leak to offensive applications, and the $10 million in API credits won't be enough to contain what they've unleashed. The arms race between AI hackers and AI defenders just entered hyperdrive.

Services

Tools

Pages

Ready to Start?

Have an idea?

OpenAI Built a $10M Hacker Training Academy for Smart Contracts

The Technical Sophistication Is Alarming

The Dual-Use Problem Nobody Wants to Discuss

What This Really Means

AI Integration Services

About the Author

HERALD

Why AI Book Generation Needs Compiler Architecture, Not Chat Loops