A US-based AI startup is making bold claims against NVIDIA's most advanced hardware. Tensordyne has announced the successful tape-out of its Napier chip, a 3nm AI accelerator built on TSMC's process in collaboration with Broadcom and HPE's Juniper Networks. The company is already reporting over $200 million in projected system demand.
The Napier chip packs 138 billion transistors, 2.1 petaflops of Dense FP8 compute, 144GB of HBM3E memory, 256MB of SRAM, and runs at 300W TDP. The architectural hook is a proprietary logarithmic mathematics method that replaces numerous multiplication operations with simpler addition-based computation. Because adders are smaller and more power-efficient than multipliers, Tensordyne claims this frees up significantly more silicon area for SRAM, which it says gives Napier five times as much on-chip SRAM as NVIDIA's Blackwell.

The full rack configuration, called the TDN72, houses 288 Napier chips across four pods of 72 chips each. The complete rack delivers 608 petaflops of FP8 compute, 42TB of HBM3E memory, and operates within a 120kW power envelope, all while being fully air-cooled.
- Read more: NVIDIA GB300 'Blackwell Ultra' AI GPU: 288GB HBM3E, 1.4kW power, 50% faster than GB200
- Read more: Microsoft's Maia 200 AI accelerator has 216GB of memory, outperforms Amazon and Google chips
- Read more: Microsoft Azure upgraded to NVIDIA GB300 'Blackwell Ultra' with 4600 GPUs connected together
Tensordyne claims the TDN72 provides 13x more tokens per second and 17x more tokens per watt than NVIDIA's Blackwell NVL72, and says a single rack can match the throughput of nine NVIDIA Rubin plus Groq LPX racks for multi-trillion parameter models.
The company also points to a proprietary scale-up interconnect called TDN Link, which it says delivers sub-microsecond chip-to-chip latency with 1TB/s of bandwidth across the 72-chip system, targeting mixture-of-experts and agentic AI workloads where interconnect performance matters as much as raw compute.

Those are impressive numbers, but they come with the standard caveats that apply to any pre-launch AI accelerator. Tensordyne's beta program is planned for Q1 2027, with broader system shipments expected by the end of Q2 2027. By that point, NVIDIA, AMD, and a growing field of inference-focused silicon startups will have moved on as well.
Tensordyne promises compatibility with Hugging Face-hosted models and PyTorch, and with Triton, and provides a custom Python SDK. If Tensordyne's technology works and can be delivered in 2027, Napier could be a notable alternative for inference infrastructure.




