Elon Musk's xAI startup is currently in the process of upgrading its Colossus AI supercomputer cluster from 100,000 NVIDIA Hopper AI GPUs, doubling it to an insane 200,000 NVIDIA Hopper AI GPUs.
Colossus is the world's largest AI supercomputer, and is using used to train xAI's Grok family of LLMs (large language models), with chatbots on offer for X Premium subscribers. Elon's massive xAI Colossus supercomputer cluster facility was recently toured (more on that in the links below) and took just 122 days to complete, something NVIDIA CEO Jensen Huang recently called Elon Musk "superhuman" because of it.
NVIDIA recently posted some content explaining its partnership with Elon and xAI, with the company explaining: "The supporting facility and state-of-the-art supercomputer was built by xAI and NVIDIA in just 122 days, instead of the typical timeframe for systems of this size that can take many months to years. It took 19 days from the time the first rack rolled onto the floor until training began".
"While training the extremely large Grok model, Colossus achieves unprecedented network performance. Across all three tiers of the network fabric, the system has experienced zero application latency degradation or packet loss due to flow collisions. It has maintained 95% data throughput enabled by Spectrum-X congestion control. This level of performance cannot be achieved at scale with standard Ethernet, which creates thousands of flow collisions while delivering only 60% data throughput".
Gilad Shainer, senior vice president of networking at NVIDIA explains: "AI is becoming mission-critical and requires increased performance, security, scalability and cost-efficiency. The NVIDIA Spectrum-X Ethernet networking platform is designed to provide innovators such as xAI with faster processing, analysis and execution of AI workloads, and in turn accelerates the development, deployment and time to market of AI solutions".
Elon Musk explained on X: "Colossus is the most powerful training system in the world. Nice work by xAI team, NVIDIA and our many partners/suppliers".
An xAI spokesperson added: "xAI has built the world's largest, most-powerful supercomputer. NVIDIA's Hopper GPUs and Spectrum-X allow us to push the boundaries of training AI models at a massive-scale, creating a super-accelerated and optimized AI factory based on the Ethernet standard".