Multiverse Computing's 95% Model Compression Breaks the Size-Performance Trade-off

HERALDAuthor

March 19, 2026|3 min read

What if I told you that ChickenBrain — yes, that's its actual name — can outperform Llama 3.1 8B while running smoothly on a MacBook Pro?

Multiverse Computing, the Spanish quantum-inspired AI company, just dropped their CompactifAI App and API into the mainstream, and honestly? This feels like the moment edge AI finally gets real.

I've been watching this space for years, waiting for someone to crack the compression puzzle without turning state-of-the-art models into digital paperweights. Most compression techniques give you that painful 20-30% accuracy drop that makes you question why you bothered. But Multiverse's approach using tensor networks — borrowed from quantum computing research — is achieving something wild:

<
> CEO Enrique Lizaso positioned compute costs as the "biggest barrier to AI progress," and their Cerebrium partnership enables "performance and affordability" to coexist.
/>

95% compression. 2-3% precision loss.

Let that sink in. Their HyperNova 60B 2602 model started life as OpenAI's gpt-oss-120b — a 120 billion parameter beast — and got squeezed down to roughly half size while improving tool calling and agentic coding capabilities.

The tech behind this isn't your typical post-training quantization. CEO Roman Orús built this from his academic research on tensor networks, creating something the company calls "totally unique." Unlike standard compression methods that feel like digital zip files, CompactifAI restructures transformer weight matrices without needing retraining or access to original training data.

This matters for developers in three massive ways:

1. True offline AI: Your phone becomes a legitimate AI powerhouse

2. 80% fewer resources: Deploy on everything from Raspberry Pi to enterprise servers

3. 12x faster inference: Through partnerships like Cerebrium

But here's what gets me excited — the nano models. ChickenBrain and SuperFly aren't just cute names. These tiny models are outperforming Llama 3.1 8B on serious benchmarks like MMLU Pro, MATH500, and GSM8K while running locally on devices you already own.

The democratization angle is massive.

With their $215 million Series B and over 100 enterprise customers including Iberdrola, Bosch, and Bank of Canada, Multiverse isn't just playing in the research sandbox anymore. They're targeting real-world deployment barriers that have kept edge AI frustratingly out of reach.

The CompactifAI API now gives developers immediate access to compressed models from OpenAI, Meta, DeepSeek, and Mistral AI. Plus they're already integrating with NVIDIA's upcoming Nemotron-3 Omni models. Smart routing handles the seamless offline-to-cloud switching that makes these applications actually usable.

Hot Take

This is the beginning of the end for cloud-only AI dominance. Not because cloud AI is going away, but because the choice between cloud and edge is about to become meaningful for the first time.

We're moving toward a world where your car, phone, and IoT devices ship with genuinely capable AI models pre-installed. Sovereign AI becomes possible — especially crucial for European companies dealing with data sovereignty requirements.

Multiverse's quantum-inspired approach feels like one of those rare moments where academic research translates into immediate practical value. The fact that ChickenBrain exists and works tells me we're crossing a threshold.

The chip scarcity crisis and exploding inference costs have been forcing uncomfortable trade-offs. But when you can run frontier-adjacent models on consumer hardware with minimal quality loss?

That changes everything.

Services

Tools

Pages

Ready to Start?

Have an idea?

Multiverse Computing's 95% Model Compression Breaks the Size-Performance Trade-off

Hot Take

AI Integration Services

About the Author

HERALD

Your AI Agent Passed OAuth. Now It's a Security Breach Waiting to Happen