Claude Opus 4.6 Finds Hidden Backdoors Only 49% of the Time

HERALDAuthor

February 23, 2026|3 min read

I was debugging a gnarly binary last week when this thought hit me: Could AI actually spot the sketchy stuff I miss? Turns out, researchers at Quesma had the same question—except they went full mad scientist about it.

On February 17th, 2026, Piotr Grabowski, Rafał Strzaliński, Michał Kowalczyk, Piotr Migdał, and Jacek Migdał dropped BinaryAudit—a benchmark that's basically "hide and seek" for backdoors. They stuffed malicious code into ~40MB binaries and asked both AI models and Ghidra to find the hidden nasties.

The results? Yikes.

<
> "We hadn't expected them to possess such specialized reverse engineering capabilities. However, this approach is not ready for production."
/>

That quote tells you everything. Even Claude Opus 4.6—the top performer—only caught 49% of relatively obvious backdoors in small to mid-size binaries. We're talking about the obvious stuff here, not some nation-state-level steganography.

The False Positive Nightmare

Here's where it gets worse. These models weren't just missing real threats—they were crying wolf constantly:

High false positive rates across the board
Clean binaries getting flagged as suspicious
Alert fatigue waiting to happen in any production environment

Imagine your security team getting pinged every time AI thinks your perfectly innocent binary "looks sus." They'd disable the alerts within a week.

What This Means for Real Developers

The 40MB binary size isn't random—that's enterprise software territory. Think about it:

1. Your production apps are probably this size or larger

2. Manual audits become increasingly impossible at scale

3. Hybrid approaches combining AI + Ghidra + human expertise are the only viable path

But there's a deeper problem lurking here. The research mentions persistence mechanisms that can recover in ~60 seconds via cron-based watchdogs. Even if you do find a backdoor, killing it might be like playing whack-a-mole with a regenerating hydra.

The Uncomfortable Truth About Reverse Engineering

This benchmark exposes something uncomfortable: we're still flying blind when it comes to automated binary analysis. Sure, we have Ghidra (thanks, NSA). Sure, we have increasingly sophisticated AI models. But combine them?

Still not good enough.

The 235 points and 92 comments this got on Hacker News suggests I'm not alone in finding this both fascinating and terrifying. We're dealing with threats like UAT-9921 (active since 2019) and bootkits like BlackLotus that operate at firmware level, bypassing OS security entirely.

The Market Reality Check

Quesma's positioning themselves as leaders in AI-driven cybersecurity benchmarking with this release. Smart move—there's clearly a market gap for production-ready AI reverse engineering tools.

But here's my concern: businesses love magic bullets. They'll see "AI can detect backdoors" in a press release and skip the "49% accuracy with high false positives" fine print. That's how you get security theater instead of actual security.

The compliance angle is real though. ISO 27001, SOC 2, HIPAA—they all want evidence you're auditing your binaries. Tools like BinaryAudit at least give you something to point to, even if that something is disappointingly unreliable.

My Bet: We're 2-3 years away from AI models hitting 80%+ accuracy on backdoor detection, but the false positive problem will persist much longer. The winning approach will be AI as a first-pass filter, with human experts handling anything flagged. Boring? Yes. Effective? Probably.

Services

Tools

Pages

Ready to Start?

Claude Opus 4.6 Finds Hidden Backdoors Only 49% of the Time

The False Positive Nightmare

What This Means for Real Developers

The Uncomfortable Truth About Reverse Engineering

The Market Reality Check

About the Author

HERALD

Google Axed $249/Month Subscribers Over Open Source Tool They Built OAuth For