Google's next-gen Tensor processor: 45 TFLOPs of power

Google packs up to 180 TFLOPs of performance on a single board, 45 TFLOPs per Tensor processor.

Published
Updated
1 minute & 27 seconds read time

Google has just unveiled its second-generation tensor processor, something that packs 45 TFLOPs of performance per chip, with four of them placed onto a tensor processor unit (TPU) module for a total of 180 TFLOPs.

Google's next-gen Tensor processor: 45 TFLOPs of power 01

The massively powerful systems are built for machine learning and artificial intelligence, and Google is pushing it into the cloud with their TPU-based computational powerhouse systems to be made available to Google Cloud Compute later this year. Google's first-gen Tensor processors were already 15-30x more powerful, and a huge 30-80x more power efficient than CPUs and GPUs for these types of workloads.

These new TPUs are "optimized for both workloads, allowing the same chips to be used for both training and making inferences. Each card has its own high-speed interconnects, and 64 of the cards can be linked into what Google calls a pod, with 11.5 petaflops total; one petaflops is 1015 floating point operations per second", reports Ars Technica.

Google's next-gen Tensor processor: 45 TFLOPs of power 02

Ars Technica points out that making comparisons between machine learning solutions is "difficult", because most GPUs have their performance measured with single precision FLOPs, which are based on 32-bit numbers. The GPUs can work with double precision (64-bit numbers), and half precision mode (16-bit). Machine learning workloads normally work on half precision when they can, but the first-gen TPUs from Google didn't use floating point at all, they used 8-bit interger approximations to floating point.

For comparisons sake, AMD's new Radeon Vega Frontier Edition has an estimated 13 TFLOPs of single precision compute performance, and 25 TFLOPs of half precision compute performance. NVIDIA's beefty new Volta-based Tesla V100 graphics solution packs 15 TFLOPs of single precision, and 120 TFLOPs for "deep learning" workloads.

NEWS SOURCE:arstechnica.com

Anthony joined the TweakTown team in 2010 and has since reviewed 100s of graphics cards. Anthony is a long time PC enthusiast with a passion of hate for games built around consoles. FPS gaming since the pre-Quake days, where you were insulted if you used a mouse to aim, he has been addicted to gaming and hardware ever since. Working in IT retail for 10 years gave him great experience with custom-built PCs. His addiction to GPU tech is unwavering and has recently taken a keen interest in artificial intelligence (AI) hardware.

Newsletter Subscription

Related Tags