AMD's new Instinct MI300X AI accelerator with 192GB of HBM3E has had a deep dive at Hot Chips 2024 this week, as well as the company teasing its refreshed MI325X with 288GB of HBM3E later this year.
Inside, AMD's new Instinct MI300X AI Accelerator features a total of 153 billion transistors, using a mix of TSMC's new 5nm and 6nm FinFET process nodes. There are 8 chiplets that feature 4 shared engines, and each shared engine contains 10 compute units.
The entire chip packs 32 shader engines, with a total of 40 shader engines inside of a single XCD and 320 in total across the entire package. Each individual XCD has its dedicated L2 cache, and out the outskirts of the package, features the Infinity Fabric Link, 8 HBM3 IO sites, and a single PCIe Gen5 link with 128GB/sec of bandwidth that connects the MI300X to an AMD EPYC CPU.
AMD uses its in-house 4th Gen Infinity Fabric on its Insintct MI300X AI accelerator, with up to 896GB/sec of bandwidth. MI300X also uses Infinity Fabric Advanced Package link, connecting all of the chips with up to 4.8TB/sec of bisection bandwidth, while the XCD/IOD interface is at 2.TB/sec bandwidth.
AMD provided a full block diagram of the MI300X architecture, with each XCD containing 2 compute units disabled, for a total of 304 CUs in MI300X out of the full 320 CU design. The full chip packs 20,480 cores while the MI300X has 19,456 cores. AMD also has 256MB of dedicated Infinity Cache on the MI300X, too.
The company also points out that its Instinct MI300X is the first AI accelerator to feature an 8-stack HBM3 memory design, with the 8-stack design allowing AMD to reach 1.5x higher capacity (128GB to 192GB) but also 1.6x higher memory bandwidth (3TB/sec to 5.2TB/sec) versus the MI250X.
AMD says that with the larger and faster HBM memory, the new Instinct MI300X can handle larger LLMs (FP16) sizes of up to 70B in training, and 680B in inference. NVIDIA HGX H100 systems can only handle LLM models of up to 30B in training and up to 290B in inference.
One of the cool features of the Instinct MI300X is that AMD has its in-house Spatial portioning technology, allowing users to partition the XCDs depending on their workloads. Each of the XCDs operate together as a single processor, but they can be partioned and then grouped to appear as multiple GPUs.
AMD did tease its refreshed Instinct MI325X AI accelerator for October, which will pack HBM3E memory, and up to 288GB of it, with even higher speeds. AMD promises 1.3x more memory bandwidth, and 1.3x peak theoretical FP16 and FP8 compute performance improvements over the Instinct MI300X and its HBM3 memory.
- Read more: AMD to source HBM3E memory from Samsung for its new AI GPUs, starting with Instinct MI325X
- Read more: AMD teases Instinct MI325X refresh in Q4, MI350 'CDNA 4' in 2025, MI400 'CDNA Next' in 2026
In 2026 we'll be introduced to the next-gen Instinct MI400 series which is based on a future-gen CDNA architecture that the company has dubbed "CDNA Next".
The new AMD Instinct MI325X accelerator, which will bring 288GB of HBM3E memory and 6 terabytes per second of memory bandwidth, use the same industry standard Universal Baseboard server design used by the AMD Instinct MI300 series, and be generally available in Q4 2024. The accelerator will have industry leading memory capacity and bandwidth, 2x and 1.3x better than the competition respectively4, and 1.3x better5 compute performance than competition.
The first product in the AMD Instinct MI350 Series, the AMD Instinct MI350X accelerator, is based on the AMD CDNA 4 architecture and is expected to be available in 2025. It will use the same industry standard Universal Baseboard server design as other MI300 Series accelerators and will be built using advanced 3nm process technology, support the FP4 and FP6 AI datatypes and have up to 288 GB of HBM3E memory.
AMD CDNA "Next" architecture, which will power the AMD Instinct MI400 Series accelerators, is expected to be available in 2026 providing the latest features and capabilities that will help unlock additional performance and efficiency for inference and large-scale AI training.