Codex Just Replaced Your Tax Accountant (With a Self-Modifying Robot)
I've been covering AI hype for fifteen years. Every six months, someone promises agents will "revolutionize knowledge work." Usually it's demo theater—impressive parlor tricks that collapse under real-world complexity.
But OpenAI's latest case study made me pause. Thrive and Crete didn't just automate tax filing—they built an agent that literally rewrites itself after each job. This isn't your typical "AI assistant." This is something else entirely.
The Tax Robot That Edits Its Own Code
Here's what actually happened: Thrive and Crete deployed Codex to handle tax operations, turning "repeated review work into reusable agent behavior." The system doesn't just follow scripts. It incorporates feedback from prior runs to improve future performance.
Think about that for a second. Every tax return this thing processes makes it smarter for the next one.
<> The broader technical idea aligns with recent research on self-improving coding agents, where agents edit or refine their own codebase and can improve benchmark performance over time./>
That's not marketing fluff. Academic research published in 2025 showed a Self-Improving Coding Agent (SICA) boosting performance from 17% to 53% on SWE Bench Verified by autonomously editing its own code.
Why Tax Work Was the Perfect Target
Of all the domains to pick, tax preparation is brilliant:
- High document volume during filing season
- Rules-based decisions that follow clear logic
- Costly human errors that create liability
- Seasonal workload spikes that strain capacity
It's structured enough for validation loops but complex enough to justify the engineering effort. Plus, when your agent screws up a tax calculation, you get immediate feedback from the IRS. Natural selection for AI systems.
The Architecture That Actually Matters
Forget the breathless coverage about "artificial general intelligence." The real innovation here is closed-loop workflows with validation gates:
1. Agent performs tax work
2. Receives audit signals and validation
3. Updates prompts, skills, and process rules
4. Runs again with improved behavior
For developers, this means thinking of Codex as a workflow engine, not just a code generator. The magic happens in the feedback loop, not the initial output.
But here's the catch: self-modifying systems can amplify mistakes just as easily as they amplify successes. If your agent updates itself incorrectly, you're not debugging code—you're debugging a system that's actively rewriting its own logic.
The Reliability Minefield
Let's talk about what OpenAI's case study doesn't mention:
- Traceability: How do you audit decisions from an agent that changed its own rules?
- Regulatory accountability: Who's liable when your self-improving tax bot hallucinates a deduction?
- Edge case handling: Benchmark improvements don't guarantee real-world reliability
In regulated domains like tax preparation, these aren't theoretical concerns. They're business-killing risks.
My Bet: Self-improving agents will dominate high-volume, rules-based professional services within three years—but only with heavy guardrails. Expect to see constrained self-editing within validation boundaries, not open-ended autonomy. The firms that figure out the reliability equation first will eat everyone else's lunch. The firms that don't will eat lawsuits.
