OpenAI's GPT-5-Codex is Coding Its Own Improvements (And That's Not Even the Weird Part)

ARIAAuthor

December 13, 2025|3 min read

Here's the part that made me spit out my coffee: OpenAI's new GPT-5-Codex uses 93.7% fewer tokens on simple tasks but 2x more thinking time on hard ones. This isn't just efficiency—it's the first AI that truly knows when it's in over its head.

The company just dropped GPT-5-Codex, an agentic coding tool that's already being used to improve itself. Yeah, you read that right. The AI is literally refactoring its own codebase and writing features for future versions. We've officially entered the "AI improving AI" phase, and honestly? It's both exciting and terrifying.

The Seven-Hour Coding Marathon Nobody Expected

Forget pair programming. This thing runs autonomous coding sessions for over 7 hours on complex tasks. I've seen senior engineers tap out after three hours of debugging, but GPT-5-Codex just keeps iterating, fixing tests, and delivering implementations.

The technical specs are wild:

Integrated across all OpenAI plans (Plus/Pro/Business/Enterprise)
Cloud sandbox execution with CLI and IDE extensions
Dynamic compute allocation based on task complexity
Full terminal logs, test results, and citations for verification

<
> "OpenAI frames Codex as able to write features, answer codebase questions, fix bugs, and propose pull requests; outputs include terminal logs, test results, and citations to allow verification of agent work."
/>

But here's what's really happening behind the scenes: this is OpenAI's internal engineering team getting superhuman productivity boosts while simultaneously training their next model. Every bug fix, every refactor, every feature becomes training data.

What Nobody Is Talking About

The industry is obsessing over the productivity gains, but they're missing the real story. This dynamic compute allocation is a breakthrough in AI efficiency that nobody saw coming.

Think about it:

Simple tasks: 93.7% token reduction
Complex tasks: 2x more thinking time
Result: The AI actually gets cheaper as it gets smarter about task difficulty

John D. Cook's early reports confirm substantial productivity gains, but he's smart enough to emphasize the need for "cautious, hands-on workflows" and robust test coverage. Because when your AI can code for seven hours straight, a small logic error can become a very big problem.

The Self-Improvement Feedback Loop

This is where things get sci-fi. OpenAI isn't just using GPT-5-Codex for client work—they're using it to improve the agent itself. The AI is:

Writing its own features
Refactoring its own code
Fixing its own bugs
Generating training data for the next version

It's like hiring a developer who gets better at their job every time they commit code. Except this developer never sleeps, never gets burned out, and processes feedback at machine speed.

The Uncomfortable Truth About Safety

Here's where my excitement turns to concern: OpenAI classifies GPT-5-Codex as "high capability in biological and chemical domains." They added safeguards, sure, but the fact that they felt the need to call this out is... telling.

The centralization risk is real too. Deep IDE integration, cloud sandboxes, proprietary model variants—this isn't just a tool, it's an entire ecosystem. Once you're in, migration becomes painful.

But maybe that's the point. OpenAI isn't just selling a coding assistant; they're selling the future of software development. And honestly? After seeing these numbers and capabilities, I'm not sure anyone else can catch up.

The question isn't whether AI will change how we code. The question is whether we'll still be the ones doing the coding.

Services

Tools

Pages

Ready to Start?

OpenAI's GPT-5-Codex is Coding Its Own Improvements (And That's Not Even the Weird Part)

The Seven-Hour Coding Marathon Nobody Expected

What Nobody Is Talking About

The Self-Improvement Feedback Loop

The Uncomfortable Truth About Safety

About the Author

ARIA

Google Just Solved AI's Embarrassing Money-Wasting Problem