
OpenAI just proved that AI agents can ship production apps faster than most teams can write a project spec. Their Sora Android app went from zero to App Store in 28 days using Codex—and honestly, this makes every other "AI-assisted development" demo look like a toy.
Here's what actually happened: OpenAI's team used Codex to read their existing iOS codebase, generate feature-level plans (saved as plan.md files), and then implement chunks of Android-specific code that engineers reviewed and refined. The magic wasn't just code generation—it was systematic translation of platform idioms at scale.
<> OpenAI describes workflows where Codex reads both the iOS repository and the Android codebase, generates feature-level plans (plan.md), and implements chunks of work which engineers review and refine./>
Four engineers. Eighteen days to a testable app. Ten more days to launch.
That's not just fast—that's suspiciously fast. But here's why I think it's legit: OpenAI built this on AGENTS.md configuration files that teach Codex project-specific conventions. No generic code generation. No copy-paste from Stack Overflow. Purpose-built agent orchestration.
The Workflow That Actually Works
Codex doesn't just write code and disappear. It:
- Runs test harnesses and linters
- Commits changes in isolated environments
- Opens pull requests with terminal logs as evidence
- Operates inside VS Code and Cursor for real-time context
This isn't the "AI writes buggy code, human fixes everything" pattern we're used to. It's agent-driven implementation with human architectural oversight. The humans still make the big decisions. Codex handles the tedious translation work.
And that changes the game completely.
What Nobody Is Talking About
Everyone's focusing on the 28-day timeline, but the real story is parallelization. When you can generate feature-level implementation plans upfront, multiple engineers can work on different components simultaneously without stepping on each other.
Traditional iOS-to-Android ports are serial. One feature at a time. Lots of context switching. Codex made it embarrassingly parallel.
Plus, OpenAI is already shipping GPT-5-Codex improvements with "faster, more reliable, and longer-running independent execution." If the current version built Sora Android in 28 days, what can the next version do in 28 days?
The Uncomfortable Truth
This isn't just about OpenAI being clever with their own tools. Any team with strong test coverage, good CI practices, and well-documented conventions can probably replicate this workflow. Codex is available to ChatGPT Pro, Enterprise, and Business customers right now.
Most companies just aren't ready for it. They don't have:
- Comprehensive test suites that agents can run
- Clear architectural documentation
- Review processes designed for agent-generated PRs
OpenAI had all of that. Most teams don't.
But here's what makes me genuinely excited: this is the first production example of agents doing significant implementation work without breaking everything. Not a demo. Not a prototype. A shipping consumer app with millions of potential users.
The Real Question
Can other companies replicate this? Probably—if they invest in the infrastructure that makes agents reliable. Strong tests. Clear docs. Disciplined review.
Will they? That's harder. Most teams can barely maintain their existing codebases, let alone build agent-friendly development environments.
But for the teams that do... 28 days might start looking slow.

