AMD powers the ultra-fast Frontier supercomputer with its Instinct MI250X AI accelerators, achieving a huge 1 trillion parameter LLM run, which rivals ChatGPT-4.
The new Frontier supercomputer is the world's leading supercomputer and the only Exascale machine that's in operation, powered by AMD EPYC processors and Instinct AI accelerators. Frontier is the fastest HPC performance the world has right now, and the second most efficient supercomputer on Earth.
A new submission report on Arxiv by individuals has teased that the Frontier supercomputer is capable of training 1 trillion parameters through "hyperparameter tuning" and setting a huge new industry benchmark. It's an incredible result, that's for sure.
Inside, the Frontier supercomputer features AMD 3rd Gen EPYC "Trento" CPUs and AMD Instinct MI250X AI GPU accelerators, with the supercomputer installed at the Oak Ridge National Laboratory (ORNL) in Tennessee, USA, operated by the Department of Energy (DOE).
- Read more: AMD CPUs and GPUs power Frontier, the world's fastest supercomputer
- Read more: AMD Instinct MI250X accelerator: MCM GPU, 128GB HBM2e, 500W TDP
With its mind-bowing 8,699,904 cores, it hits an insane 1.194 Exaflop/s, with the HPE Cray EX architecture combining the AMD 3rd Gen EPYC CPUs optimized for HPC and AI, with the AMD Instinct MI250X AI GPU accelerators and a Slingshot-11 interconnect. With all of this power, the Frontier supercomputer has been able to keep its leadership on the Top500.org list of supercomputers, sitting at number one.
I love the CPU and GPU power of the Frontier supercomputer, with these new records by Frontier being a result of effective strategies to train LLMs (Large Language Models) and use the onboard hardware as efficiently as possible. The team hit these records using extensive testing of 22 billion, 175 billion, and 1 trillion parameters which are a result of optimizing and fine-tuning the model training process.
Inside, there were 3000 x AMD Instinct MI250X AI GPU accelerators, and while they're older models -- AMD has announced its new Instinct MI300X AI GPUs -- the records speak for themselves.
Now... that was with 3000 x AMD Instinct MI250X AI GPU accelerators, but the entire Frontier supercomputer features an astonishing 37,000 x Instinct MI250X accelerators. If the team was using the entire GPU pool of 37,000 x AMD Instinct MI250X AI GPU accelerators, just imagine what training LLMs could do... mind-boggling.
Arvix explained: "For 22 Billion, 175 Billion, and 1 Trillion parameters, we achieved GPU throughputs of 38.38%, 36.14%, and 31.96%, respectively. For the training of the 175 Billion parameter model and the 1 Trillion parameter model, we achieved 100% weak scaling efficiency on 1024 and 3072 MI250X GPUs, respectively. We also achieved strong scaling efficiencies of 89% and 87% for these two models".