Meta's new AI supercomputer: 16,000 x GPUs, insane 175PB bulk storage

Meta's new AI Research SuperCluster (RSC) is a metaverse and AI beast -- 16,000 GPUs, 16TB/sec training data, 175PB bulk storage.

1 minute & 58 seconds read time

Facebook, I mean Horizons, I mean LifeLog, I mean Meta have announced that the social networking -- and VR giant -- is putting the finishing touches on the world's fastest AI supercomputer.

Meta's new AI supercomputer: 16,000 x GPUs, insane 175PB bulk storage 03

The new AI Research SuperCluster (RSC) is already being built and expected to be online by mid-2022, where it is already being used to train huge models in natural language processing (NLP) and computer vision for research -- Meta's aim with RSC -- is that it will one-day train models with trillions and trillions of parameters.

RSC will help super-fuel Meta's AI researchers that will be capable of being fed trillions of examples, where it will use AI to work through hundreds of languages across the planet, seamlessly analyze text, images, and videos together, develop new augmented reality (AR) tools, and so much more.

But man... inside, RSC is a freaking silicon beast.

Meta's first-gen RSC infrastructure was designed in 2017 with 22,000 x NVIDIA V100 Tensor Core GPUs in a single cluster, capable of 35,000 training jobs per day. Meta AI researchers have been using this as a benchmark for performance, reliability, and productivity.

Meta's new AI supercomputer: 16,000 x GPUs, insane 175PB bulk storage 04

But in early 2020, then-Facebook decided that the "best way to accelerate progress was to design a new computing infrastructure from a clean slate to take advantage of new GPU and network fabric technology. We wanted this infrastructure to be able to train models with more than a trillion parameters on data sets as large as an exabyte - which, to provide a sense of scale, is the equivalent of 36,000 years of high-quality video".

Meta RSC AI Supercomputer specs

  • 760 x NVIDIA DGX A100 systems (as compute nodes)
  • 6080 x GPUs in total
  • 175PB (petabytes) of Pure Storage FlashArray
  • 46PB (petabytes) of cache storage in Penguin Computing Altus systems
  • 10PB (petabytes) of Pure Storage FlashBlade

(each DGX communicates over NVIDIA Quantum 1600 Gb/s InfiniBand two-level Clos fabric with no oversubscription)

Meta has run some benchmarks on its new RSC -- and no, unfortunately it's not Crysis -- where the new AI supercomputer was compared to Meta's legacy production and research infrastructure. Here, Meta's new RSC was super-speeding computer vision workflows by up to 20x -- running the NVIDIA Collective Communication Library (NCCL) over 9x faster, and RSC also trains large-scale NLP models up to 3x faster.

A model with tens of billions of parameters can now finish in just 3 weeks, compared to 9 weeks with the previous-gen AI supercomputer.

Meta CEO Mark Zuckerberg said: "Meta has developed what we believe is the world's fastest AI supercomputer. We're calling it RSC for AI Research SuperCluster. The experiences we're building for the metaverse require enormous compute power (quintillions of operations / second!) and RSC will enable new AI models that can learn from trillions of examples, understand hundreds of languages, and more. Congrats to the team on building RSC!".

Buy at Amazon

Oculus Quest 2 - Advanced All-In-One Virtual Reality Headset - 256 GB

TodayYesterday7 days ago30 days ago
* Prices last scanned on 7/16/2024 at 4:13 pm CDT - prices may not be accurate, click links above for the latest price. We may earn an affiliate commission.

Anthony joined the TweakTown team in 2010 and has since reviewed 100s of graphics cards. Anthony is a long time PC enthusiast with a passion of hate for games built around consoles. FPS gaming since the pre-Quake days, where you were insulted if you used a mouse to aim, he has been addicted to gaming and hardware ever since. Working in IT retail for 10 years gave him great experience with custom-built PCs. His addiction to GPU tech is unwavering and has recently taken a keen interest in artificial intelligence (AI) hardware.

Newsletter Subscription

Related Tags