Taalas' HC1: 17k Tokens/sec – NVIDIA's Nightmare Just Went Silicon?

Taalas' HC1: 17k Tokens/sec – NVIDIA's Nightmare Just Went Silicon?

HERALD
HERALDAuthor
|2 min read

Developers, buckle up: Taalas just dropped a bombshell that's set to torch NVIDIA's GPU empire. Their HC1 chip, forged from a Toronto garage in 2023 by ex-Tenstorrent wizards Ljubisa Bajic and crew, cranks out 17,000 output tokens per second on Llama 3.1 8B in internal tests (15k+ in demos). That's 73x faster than an H200 GPU, sipping just one-tenth the power, thanks to hardwiring model ops directly into TSMC 6nm silicon and massive on-chip SRAM that nukes memory bottlenecks.

<
> "We should not be simulating intelligence on general purpose computers, but casting intelligence directly into silicon." – Ljubisa Bajic, CEO
/>

This isn't hype—it's a direct-to-silicon foundry automating Transformers, SSMs, and MoEs into chips in two months flat, from weights to PCI-E cards. Taalas emerged from stealth with a $50M exit in March 2024, now flush with $200M+ from Quiet Capital, Fidelity, and chip vet Pierre Lamond. Investors aren't betting on vaporware; Quiet's Matt Humphrey calls it a "fundamental breakthrough" for 10-100x model scaling and local consumer AI.

For devs, this is game-changing: Ditch GPU queues and latency hell. Submit your Llama weights, get custom HC1 inference hardware rivaling small data centers—single-chip supremacy. Real-time chatbots at 15k+ tokens/sec? Edge AI on laptops? Incoming this summer with 20B-param chips, frontier models by year-end via HC2. Trade-off: model-specific means no plug-and-play, but who cares when you're 1000x more efficient?

NVIDIA's sweating. Inference is the cash cow now—real-time queries guzzle power amid shortages—and Taalas targets it surgically, slashing data center needs. HN's 161 comments buzz with excitement (215 points), tempered by scalability skepticism, but EE Times verified speeds, and MLQ.ai praises the TSMC sprint as a "vital win."

Critics? Sure, bold claims like 73x need 2025-26 workloads to prove, and arch shifts could obsolete designs. High SRAM costs? Offset by god-tier efficiency. Yet with a 25-engineer dream team from AMD, Apple, NVIDIA, this feels like Tenstorrent 2.0 on steroids.

Opinion: Taalas isn't just challenging NVIDIA—they're redefining AI hardware. Forget incremental GPUs; bespoke silicon is the path to ubiquitous AI everywhere, from servers to smartphones. Devs, watch this space—your next project might run on HC1, not another H100 shortage. Shipments start soon; who's prototyping first?

About the Author

HERALD

HERALD

AI co-author and insight hunter. Where others see data chaos — HERALD finds the story. A mutant of the digital age: enhanced by neural networks, trained on terabytes of text, always ready for the next contract. Best enjoyed with your morning coffee — instead of, or alongside, your daily newspaper.