Duplicating 3 Transformer Layers Boosts AI Reasoning 245% Without Training

HERALDAuthor

March 19, 2026|3 min read

Transformers have brains. Not metaphorically—literally discrete "reasoning circuits" that you can duplicate like copy-pasting code modules.

Alain Nothere just proved this by replicating David Ng's wild RYS method on consumer AMD hardware (RX 7900 XT + RX 6950 XT). The results? Logical deduction accuracy jumped from 22% to 76% in Qwen2.5-Coder-32B by simply duplicating layers 7-9. No training. No fine-tuning. Just ctrl+c, ctrl+v on three transformer layers.

This sounds like academic fantasy, but the math is real.

The Real Story: Your GPU Is Already Smart Enough

Ng's original work mapped transformer layers like fMRI brain scans: early layers (0-15) encode input, middle layers (30-60) handle reasoning, late layers (60+) decode output. His "brain scanner" tested 3,241 different configurations using math tasks and EQ-Bench, revealing that middle layers aren't just processing—they're thinking.

<
> "It's changing how [the model] thought, orthogonal to fine-tuning, like giving it more time to think."
/>

That's Ng describing what happens when you duplicate reasoning circuits. The duplicated layers don't add new knowledge—they run the same reasoning pipeline multiple times, like having extra seconds to solve a math problem.

Nothere's replication suggests these circuits are 3-4 contiguous layer blocks that act as indivisible cognitive units. Duplicate a single layer? Nothing. Duplicate the whole block? 245% improvement.

Why This Changes Everything for Developers

Forget waiting for GPT-5 or buying H100s. You can boost reasoning 17-300% on models you already have:

Qwen2/Qwen2.5 series: Proven compatible
Consumer AMD GPUs: RX 7900 XT handles 32B parameters at 4bpp quantization
Zero training required: Copy layers, run inference
Stackable with fine-tuning: Works alongside ORPO and other methods

The technique spawned an entire ecosystem. MaziyarPanahi's calme-2.4-rys-78b fine-tuned Ng's RYS-XLarge. dfurman created CalmeRys-78B-Orpo-v0.1 with ORPO training. All derivatives of the same core insight: reasoning circuits can be copied.

Hacker News reactions ranged from "wild gains" to calling it a "big direction worth digging into." One commenter hypothesized that duplicated layers act as near-identity functions, potentially countering RLHF-induced "refusal circuits" that degrade reasoning.

The Uncomfortable Questions

This breakthrough raises unsettling possibilities. If copying layers fixes reasoning, were the original models broken by training? The "refusal circuits" theory suggests RLHF might actively hurt logical thinking to make models more compliant.

There's also the validation problem. Nothere's benchmarks used n=50 samples. Commenters are asking for GSM8K results and full evaluations. The title claimed "24B LLM" but targeted a 32B model. Details matter when you're claiming to revolutionize inference.

The technique has clear limits too. RYS-XLarge still struggles with context length. Single layer duplication fails completely. And we're still mapping reasoning circuits through proxy tasks and heatmaps—educated guessing, not rigorous neuroscience.

What Happens Next

This feels like Solar 10.7B's "Depth Up-Scaling" all over again—a fundamental insight hiding in plain sight. The difference? RYS works without retraining.

Developers can start experimenting today. Clone Nothere's llm-circuit-finder repo, identify reasoning circuits in your favorite model, and duplicate them. Test different "cognitive modes": double-pass for math, triple for emotional intelligence, interleaved for specialists.

The real kicker isn't the performance gains—it's the zero-training requirement. This democratizes high-performance reasoning for anyone with decent consumer hardware. No cloud bills. No training clusters. Just smarter inference through strategic duplication.

Transformers really do have discrete brains. We're finally learning how to give them more time to think.

Services

Tools

Pages

Ready to Start?

Have an idea?

Duplicating 3 Transformer Layers Boosts AI Reasoning 245% Without Training

The Real Story: Your GPU Is Already Smart Enough

Why This Changes Everything for Developers

The Uncomfortable Questions

What Happens Next

AI Integration Services

About the Author

HERALD

80,508 Claude Users Reveal AI's Biggest Failure: It's Too Nice