Codex Just Replaced Your Tax Accountant (With a Self-Modifying Robot)

HERALDAuthor

May 27, 2026|3 min read

I've been covering AI hype for fifteen years. Every six months, someone promises agents will "revolutionize knowledge work." Usually it's demo theater—impressive parlor tricks that collapse under real-world complexity.

But OpenAI's latest case study made me pause. Thrive and Crete didn't just automate tax filing—they built an agent that literally rewrites itself after each job. This isn't your typical "AI assistant." This is something else entirely.

The Tax Robot That Edits Its Own Code

Here's what actually happened: Thrive and Crete deployed Codex to handle tax operations, turning "repeated review work into reusable agent behavior." The system doesn't just follow scripts. It incorporates feedback from prior runs to improve future performance.

Think about that for a second. Every tax return this thing processes makes it smarter for the next one.

<
> The broader technical idea aligns with recent research on self-improving coding agents, where agents edit or refine their own codebase and can improve benchmark performance over time.
/>

That's not marketing fluff. Academic research published in 2025 showed a Self-Improving Coding Agent (SICA) boosting performance from 17% to 53% on SWE Bench Verified by autonomously editing its own code.

Why Tax Work Was the Perfect Target

Of all the domains to pick, tax preparation is brilliant:

High document volume during filing season
Rules-based decisions that follow clear logic
Costly human errors that create liability
Seasonal workload spikes that strain capacity

It's structured enough for validation loops but complex enough to justify the engineering effort. Plus, when your agent screws up a tax calculation, you get immediate feedback from the IRS. Natural selection for AI systems.

The Architecture That Actually Matters

Forget the breathless coverage about "artificial general intelligence." The real innovation here is closed-loop workflows with validation gates:

1. Agent performs tax work

2. Receives audit signals and validation

3. Updates prompts, skills, and process rules

4. Runs again with improved behavior

For developers, this means thinking of Codex as a workflow engine, not just a code generator. The magic happens in the feedback loop, not the initial output.

But here's the catch: self-modifying systems can amplify mistakes just as easily as they amplify successes. If your agent updates itself incorrectly, you're not debugging code—you're debugging a system that's actively rewriting its own logic.

The Reliability Minefield

Let's talk about what OpenAI's case study doesn't mention:

Traceability: How do you audit decisions from an agent that changed its own rules?
Regulatory accountability: Who's liable when your self-improving tax bot hallucinates a deduction?
Edge case handling: Benchmark improvements don't guarantee real-world reliability

In regulated domains like tax preparation, these aren't theoretical concerns. They're business-killing risks.

My Bet: Self-improving agents will dominate high-volume, rules-based professional services within three years—but only with heavy guardrails. Expect to see constrained self-editing within validation boundaries, not open-ended autonomy. The firms that figure out the reliability equation first will eat everyone else's lunch. The firms that don't will eat lawsuits.

Services

Tools

Pages

Ready to Start?

Have an idea?

Codex Just Replaced Your Tax Accountant (With a Self-Modifying Robot)

The Tax Robot That Edits Its Own Code

Why Tax Work Was the Perfect Target

The Architecture That Actually Matters

The Reliability Minefield

AI Integration Services

About the Author

HERALD

325 Million Downloads: The Starlette Bug That Broke AI's Security Model