Frontier supercomputer with 3000 x AMD Instinct MI250X cards: 1 trillion parameter LLM run

Frontier supercomputer with AMD Instinct MI250X AI GPUs achieves 1 trillion parameter LLM run, which rivals OpenAI's dominant ChatGPT-4.

2 minutes & 7 seconds read time

AMD powers the ultra-fast Frontier supercomputer with its Instinct MI250X AI accelerators, achieving a huge 1 trillion parameter LLM run, which rivals ChatGPT-4.

Frontier supercomputer with 3000 x AMD Instinct MI250X cards: 1 trillion parameter LLM run 802

The new Frontier supercomputer is the world's leading supercomputer and the only Exascale machine that's in operation, powered by AMD EPYC processors and Instinct AI accelerators. Frontier is the fastest HPC performance the world has right now, and the second most efficient supercomputer on Earth.

A new submission report on Arxiv by individuals has teased that the Frontier supercomputer is capable of training 1 trillion parameters through "hyperparameter tuning" and setting a huge new industry benchmark. It's an incredible result, that's for sure.

Inside, the Frontier supercomputer features AMD 3rd Gen EPYC "Trento" CPUs and AMD Instinct MI250X AI GPU accelerators, with the supercomputer installed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, operated by the Department of Energy (DOE).

With its mind-bowing 8,699,904 cores, it hits an insane 1.194 Exaflop/s, with the HPE Cray EX architecture combining the AMD 3rd Gen EPYC CPUs optimized for HPC and AI, with the AMD Instinct MI250X AI GPU accelerators and a Slingshot-11 interconnect. With all of this power, the Frontier supercomputer has been able to keep its leadership on the list of supercomputers, sitting at number one.

I love the CPU and GPU power of the Frontier supercomputer, with these new records by Frontier being a result of effective strategies to train LLMs (Large Language Models) and use the onboard hardware as efficiently as possible. The team hit these records using extensive testing of 22 billion, 175 billion, and 1 trillion parameters which are a result of optimizing and fine-tuning the model training process.

Inside, there were 3000 x AMD Instinct MI250X AI GPU accelerators, and while they're older models -- AMD has announced its new Instinct MI300X AI GPUs -- the records speak for themselves.

Now... that was with 3000 x AMD Instinct MI250X AI GPU accelerators, but the entire Frontier supercomputer features an astonishing 37,000 x Instinct MI250X accelerators. If the team was using the entire GPU pool of 37,000 x AMD Instinct MI250X AI GPU accelerators, just imagine what training LLMs could do... mind-boggling.

Arvix explained: "For 22 Billion, 175 Billion, and 1 Trillion parameters, we achieved GPU throughputs of 38.38%, 36.14%, and 31.96%, respectively. For the training of the 175 Billion parameter model and the 1 Trillion parameter model, we achieved 100% weak scaling efficiency on 1024 and 3072 MI250X GPUs, respectively. We also achieved strong scaling efficiencies of 89% and 87% for these two models".

Buy at Amazon

AMD Instinct MI100 32 GB HBM2 (100-506116)

TodayYesterday7 days ago30 days ago
Buy at Newegg
* Prices last scanned on 2/27/2024 at 12:44 am CST - prices may not be accurate, click links above for the latest price. We may earn an affiliate commission.

Anthony joined the TweakTown team in 2010 and has since reviewed 100s of graphics cards. Anthony is a long time PC enthusiast with a passion of hate for games built around consoles. FPS gaming since the pre-Quake days, where you were insulted if you used a mouse to aim, he has been addicted to gaming and hardware ever since. Working in IT retail for 10 years gave him great experience with custom-built PCs. His addiction to GPU tech is unwavering and has recently taken a keen interest in artificial intelligence (AI) hardware.

Newsletter Subscription

Related Tags