OpenAI Claims 5 Correct Solutions to Secret Math Challenge Worth Millions
What happens when you lock 11 world-class mathematicians in a room and ask them to design problems that would stump the smartest AI on the planet?
You get the First Proof challenge—and OpenAI just submitted answers to all 10 problems, claiming their model probably got at least five right. Problems 4, 5, 6, 9, and 10, to be specific.
This isn't your typical AI benchmark. No recycled competition problems. No multiple choice answers that can be auto-graded by a script.
<> "The problems were specifically designed to be unpublished and unpresented online, ensuring they don't exist in any AI training dataset."/>
The mathematical elite crafted these problems across algebraic combinatorics, spectral graph theory, algebraic topology, stochastic analysis, symplectic geometry—you know, the stuff that makes most computer science PhDs break out in cold sweats. Each solution runs roughly five pages. The decryption keys for the actual answers were locked away until February 13, 2026, giving AI systems exactly one week to crack them.
Unlike sterile testing environments, this challenge threw the kitchen sink at the models: unlimited internet access, research tools, whatever they needed. Real-world conditions for real-world problems.
The Intimidation Game
But here's where it gets interesting. Mathematician Ken Ono tested OpenAI's o4-mini model in a secret 2025 meeting and walked away with a chilling observation:
<> "o4-mini has mastered proof by intimidation; it says everything with so much confidence."/>
Confidence used to mean something in mathematics. When only the brightest minds could construct convincing arguments, swagger correlated with correctness. Not anymore.
The model delivers "convincing — but potentially incorrect — answers." Think about that for a second. We're dealing with AI systems that can sound more authoritative than tenured professors while being completely wrong.
Beyond Party Tricks
This isn't OpenAI's first mathematical rodeo. They hit gold medal performance on the International Mathematical Olympiad in July 2025, scoring 35 out of 42 points. In November, they documented GPT-5 helping researchers across mathematics, physics, and biology make concrete progress.
But competition math is fundamentally different from research math. Competition problems have clean answers and established solution techniques. Research problems are messy, open-ended, and require building end-to-end arguments in specialized domains where correctness demands expert review.
That's exactly what makes First Proof brilliant—and terrifying.
The Verification Problem
Kevin Buzzard draws parallels to the four-color theorem, proved by computer in the past and initially rejected by the mathematical community before eventually landing in textbooks. Maybe AI proofs will follow the same trajectory.
Maybe not.
The challenge addresses a fundamental gap in AI evaluation: testing on genuine, unpublished research rather than artificially constructed puzzles. Multiple valid proofs might exist. Different counterexamples could work. There's no answer key to check against—just human experts squinting at pages of mathematical reasoning, trying to spot the holes.
Hot Take
OpenAI's 50% success rate sounds impressive until you remember they're essentially claiming to have solved half of ten cutting-edge research problems in a week. If true, we're looking at AI systems that could fundamentally accelerate mathematical research. If false, we're watching the world's most sophisticated bullshit generator convince everyone it's a genius.
The mathematical community will spend months verifying these proofs. But the precedent is set: AI systems are now attempting to solve problems that would challenge entire academic departments.
Confidence and correctness used to walk hand in hand. Now we'll find out if OpenAI's swagger matches its mathematical chops—or if we're all being intimidated by a very convincing imposter.

