US AI startup Tensordyne claims 3nm Napier chip outperforms NVIDIA Blackwell by 13x in tokens per second

The TDN72 rack houses 288 Napier chips, delivers 608 petaflops of FP8 compute, and claims to match the throughput of nine NVIDIA Rubin plus Groq LPX racks.

VIEW GALLERY - 3

Hassam Nasir

Tech Reporter

Published Jun 16, 2026 12:15 AM CDT

1 minute & 30 seconds read time

TL;DR: Tensordyne's 3nm Napier chip, featuring 138 billion transistors and a unique logarithmic math approach, claims 13x higher tokens per second and 17x tokens per watt than NVIDIA's Blackwell. The TDN72 rack with 288 chips delivers 608 petaflops FP8 compute and targets 2027 system shipments.

Voice: Hassam NasirSpeed

0:00 / 3:32

A US-based AI startup is making bold claims against NVIDIA's most advanced hardware. Tensordyne has announced the successful tape-out of its Napier chip, a 3nm AI accelerator built on TSMC's process in collaboration with Broadcom and HPE's Juniper Networks. The company is already reporting over $200 million in projected system demand.

The Napier chip packs 138 billion transistors, 2.1 petaflops of Dense FP8 compute, 144GB of HBM3E memory, 256MB of SRAM, and runs at 300W TDP. The architectural hook is a proprietary logarithmic mathematics method that replaces numerous multiplication operations with simpler addition-based computation. Because adders are smaller and more power-efficient than multipliers, Tensordyne claims this frees up significantly more silicon area for SRAM, which it says gives Napier five times as much on-chip SRAM as NVIDIA's Blackwell.

US AI startup Tensordyne claims 3nm Napier chip outperforms NVIDIA Blackwell by 13x in tokens per second 2

VIEW GALLERY - 3 IMAGES

The full rack configuration, called the TDN72, houses 288 Napier chips across four pods of 72 chips each. The complete rack delivers 608 petaflops of FP8 compute, 42TB of HBM3E memory, and operates within a 120kW power envelope, all while being fully air-cooled.

Tensordyne claims the TDN72 provides 13x more tokens per second and 17x more tokens per watt than NVIDIA's Blackwell NVL72, and says a single rack can match the throughput of nine NVIDIA Rubin plus Groq LPX racks for multi-trillion parameter models.

The company also points to a proprietary scale-up interconnect called TDN Link, which it says delivers sub-microsecond chip-to-chip latency with 1TB/s of bandwidth across the 72-chip system, targeting mixture-of-experts and agentic AI workloads where interconnect performance matters as much as raw compute.

US AI startup Tensordyne claims 3nm Napier chip outperforms NVIDIA Blackwell by 13x in tokens per second 3

Those are impressive numbers, but they come with the standard caveats that apply to any pre-launch AI accelerator. Tensordyne's beta program is planned for Q1 2027, with broader system shipments expected by the end of Q2 2027. By that point, NVIDIA, AMD, and a growing field of inference-focused silicon startups will have moved on as well.

Tensordyne promises compatibility with Hugging Face-hosted models and PyTorch, and with Triton, and provides a custom Python SDK. If Tensordyne's technology works and can be delivered in 2027, Napier could be a notable alternative for inference infrastructure.

US AI startup Tensordyne claims 3nm Napier chip outperforms NVIDIA Blackwell by 13x in tokens per second

Best Deals: NVIDIA Blackwell B100 and GB200 Graphics Cards

Comments

Similar News Stories