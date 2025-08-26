TL;DR: AMD's Instinct MI350 series AI accelerators, built on TSMC's advanced N3P process, deliver enhanced AI performance with 185 billion transistors, up to 288GB HBM3E memory, and superior power efficiency. The flagship MI355X outperforms NVIDIA's B200 in memory, bandwidth, and precision compute capabilities.

AMD launched its new Instinct MI350 series AI accelerators two months ago, but the company has now detailed the MI350 chip at Hot Chips 2025, all fabbed on TSMC's bleeding-edge N3P process node.

AMD's new Instinct MI350 series AI accelerators feature the CDNA 4 architecture, bringing improved performance and efficiency for AI workloads, as well as support for larger capacities of VRAM and capacity at higher speeds, faster AI training and inference on large models with boosted link speed, and improved power efficiency and performance.

The new flagship Instinct MI355X AI accelerator is liquid-cooled with up to 1400W of power, with its GPU running at 2400MHz, with up to 288GB of HBM3E memory.

AMD has been engineering some mighty fine chips in the last few years, with the new MI350 no different as it uses the best of what TSMC has to offer in its N3P process node and its advanced packaging technologies. MI350 features a huge 185 billion transistors, using a 3D Multi-Chiplet layout with two chiplet types + HBM3E memory. AMD uses a dual 3nm + 6nm process node for MI350 on CoWoS-S advanced packaging from TSMC.

The XCDs (Accelerator Complex Dies) are based on TSMC's N3P (3nm + performance) process technology, with 8 of them in total on a single MI350X/MI355X package, and 4 each on the IOD. The IOD (AMD I/O Base Die) is fabbed on TSMC N6 (N6 process), and is very cost-effective thanks to it being a mature, tested process node which is great for yields and price. AMD uses two of them in the XCD, with the IOD packing the Infinity Fabric AP interconnect.

The MI350 IOD houses the HBM3E memory, with 8 physical stacks on a 128-bit channel interface, with up to 288GB of HBM3E in total (36GB per 12-Hi stack at 8Gbps) and a total of 8TB/sec total bandwidth. These two dies are linked up through Infinity Fabric supporting full bandwidth and flat address space to all of the chiplets using Infinity Fabric Advanced Package (AP) with 5.5TB/sec of bi-directional bandwidth.

AMD is using 256MB of Infinity Cache here, with the 2 IODs providing enhanced power efficiency when compared to the MI300, with wider data pipelines enabling higher bandwidth, but at lower frequencies.

Another cool thing that AMD is doing with its new Instinct MI350 series AI accelerators is that it supports flexible GPU positioning per socket, with the memory partitioned into two separate clusters. AMD can use this flexibility to the GPUs or XCDs, separating the quad XCD cluster or separating them into dual, or single blocks, with the MI350 series chips supporting up to 8 instances of 70B models in CPX+NPS2 AI workloads.

This culminates in the Instinct MI355X AI accelerator beating the GB200 and B200 AI GPUs from NVIDIA, here's how it compares:

AMD Instinct MI355X vs NVIDIA B200:

Memory : 1.6x Higher

: 1.6x Higher Bandwidth : 1.0x Higher

: 1.0x Higher FP64 : 2.1x Higher

: 2.1x Higher FP16 : 1.1x Higher

: 1.1x Higher FP8 : 1.1x Higher

: 1.1x Higher FP6 : 2.2x Higher

: 2.2x Higher FP4: 1.1x Higher

