AMD and Intel Unveil ACE: New matrix instructions deliver a massive 16x AI performance leap over AVX

ACE is deployed via the x86 Ecosystem Advisory Group (EAG) to ensure the same code runs consistently and without fragmentation across both Intel and AMD.

AMD and Intel Unveil ACE: New matrix instructions deliver a massive 16x AI performance leap over AVX
Comment IconFacebook IconX IconReddit Icon
Tech Reporter
Published
2-minute read time
TL;DR: Intel and AMD have jointly developed the AI Compute Extensions (ACE) for x86 CPUs, boosting AI compute density 16 times over AVX10 by using 2D tile registers and outer-product algorithms. ACE aims to unify AI workloads on CPUs, enhancing energy efficiency and software compatibility across devices.
0:00 / 0:00

Intel and AMD have partnered to standardize a new x86 ISA extension to supercharge AI performance. Under the x86 Ecosystem Advisory Group (EAG), AI Compute Extensions (ACE) are engineered to deliver a 16x increase in compute density over standard AVX10 instructions.

Traditional vectorized hardware (SIMD) and instructions, such as AVX, operate on one-dimensional data. Machine learning, powered by Matrix Multiplication and Accumulate (MMA), has created a demand for specialized hardware that runs in two dimensions. This need led to the rise of specialized Tensor Cores in GPUs, and now ACE brings that same logic to the x86 CPU.

ACE uses an outer-product algorithm and eight new 2D Tile Registers, each with 16x16 dimensions and 32-bit precision. Large AI datasets are split into sub-matrices, with the hardware consuming two 16x4 input matrices (at 8-bit precision) - the Blue and Yellow matrices. At each intersection on the 16x16 grid, the hardware computes an inner product between a 1x4 and a 4x1 vector.

AMD and Intel Unveil ACE: New matrix instructions deliver a massive 16x AI performance leap over AVX 2

This setup creates 256 new products simultaneously and stores them in the Tile Register (Green matrix). In total, it performs 1,024 multiplications per clock cycle. The Tile Register does not overwrite data in the next cycle. Instead, it accumulates results, adding new values to existing ones.

Traditional AVX, without optimizations, treats the 512-bit vectors as simple rows of 64 elements (8-bit data), resulting in only 64 multiplications. The jump from 64 to 1,024 is the source of the 16x performance claim. ACE is not a replacement for AVX, but an extension of the instruction set.

Because this extension is a joint effort between both x86 giants, ACE is poised to avoid the instruction fragmentation that hindered the rollout of AVX-512. The architecture is specifically designed to handle lightweight AI workloads directly on the CPU, preventing the need to engage the more power-hungry GPU and significantly improving energy efficiency.

AMD and Intel Unveil ACE: New matrix instructions deliver a massive 16x AI performance leap over AVX 3

Currently, no CPUs with ACE support have been announced or seen in the wild. However, software enablement is underway, with a focus on optimized kernels, libraries, and ML runtime integrations. The team is adding ACE support to major libraries like NumPy and SciPy, as well as to AI frameworks such as PyTorch and TensorFlow.

Historically, the greatest strength of the x86 ecosystem has been its scale, and its greatest weakness has been fragmentation. By co-authoring ACE, Intel and AMD have established a consistent standard across the entire stack. This unified approach enables seamless software deployment, ensuring that the same code runs identically whether it's on a consumer-grade laptop or a rack-scale data center. However, it'll likely be a while before we see the results of this partnership bear fruit.

Photo of the Intel Core Ultra 9 Desktop Processor 285K
Best Deals: Intel Core Ultra 9 Desktop Processor 285K
Today7 days ago30 days ago
$549.98 USD$557 USD
--
£479.50-
$549.98 USD$557 USD
Check PriceCheck Price
* Prices last scanned 4/30/2026 at 12:51 pm CDT - prices may be inaccurate. As an Amazon Associate, we earn from qualifying purchases. We earn affiliate commission from any Newegg or PCCG sales.
News Source:x.com

Tech Reporter

Email IconX IconLinkedIn Icon

Hassam is a veteran tech journalist and editor with over eight years of experience embedded in the consumer electronics industry. His obsession with hardware began with childhood experiments involving semiconductors, a curiosity that evolved into a career dedicated to deconstructing the complex silicon that powers our world. From benchmarking PC internals to stress-testing flagship CPUs and GPUs, Hassam specializes in translating high-level engineering into deep, unbiased insights for the enthusiast community.

Follow TweakTown on Google News
Newsletter Subscription