New Intel driver lets you dedicate 93% of system memory to the iGPU for VRAM, enabling support for larger AI models

The new driver targets systems with built-in Arc Pro Graphics, enabling, for example, a 64GB host system to allocate 59.5GB to the integrated GPU.

New Intel driver lets you dedicate 93% of system memory to the iGPU for VRAM, enabling support for larger AI models
Comment IconFacebook IconX IconReddit Icon
Tech Reporter
Published
2-minute read time
TL;DR: Intel's new driver for Arc Pro GPUs increases integrated GPU memory allocation to 93% of system RAM, enabling larger LLM inference on select models like Arc Pro B390 and B370. This supports running substantial AI models on affordable hardware, though performance depends on memory bandwidth and computational power.
0:00 / 0:00

Intel's latest driver release, 32.0.101.8517, for Arc Pro GPUs increases the integrated GPU's memory allocation to enable broader LLM inference support. The new driver allows users to allocate up to 93% of their system RAM to the integrated GPU. While the driver currently supports only a select number of SKUs, Intel is paving the way for larger LLM inference workloads without hitting memory capacity bottlenecks.

Traditional memory partitioning usually limits a GPU to 50% of system RAM. AMD's Variable Graphics Memory (VGM) allows high-end configurations, such as the Strix Halo, to allocate 96GB from a 128GB pool to the iGPU. Intel has been more aggressive in this regard. Last year, Intel raised the limit to 87% with its new "Shared GPU Memory Override" for Core Ultra Series 2 processors.

The latest driver release pushes that boundary further to 93% for local AI inference. This only supports integrated Arc Pro GPUs, such as the Arc Pro B390 and Arc Pro B370. While this allocation update is the headline feature for integrated GPUs only, the driver also supports discrete Arc Pro A and B-series cards.

New Intel driver lets you dedicate 93% of system memory to the iGPU for VRAM, enabling support for larger AI models 2

This allows users to run much larger LLMs without expensive hardware. On a 32GB system, this allocation provides enough memory to run a Qwen 2.5 32B model at 4-bit quantization with a comfortable context window. Meanwhile, workstations equipped with 64GB of RAM can run heavyweight models like Llama 3 70B, with enough headroom for the KV cache and system stability.

While this is impressive, computational power and bandwidth still affect the model's run time. Intel's Core Ultra Series 3 (Panther Lake) chips feature fast LPDDR5X-9600 memory, delivering bandwidth in the 150 GB/s range. AMD's Strix Halo, on the other hand, has a 256-bit memory bus that delivers 256 GB/s of bandwidth. This ensures large models not only fit in memory but also run at respectable speeds.

New Intel driver lets you dedicate 93% of system memory to the iGPU for VRAM, enabling support for larger AI models 3

Apple Silicon, however, remains the gold standard. The M5 Max offers 614 GB/s bandwidth, but its real advantage is the Unified Memory Architecture (UMA). Apple's UMA ignores the traditional partitioning found in the x86 world, where, instead of setting a hard limit or fence, the entire memory pool is natively accessible to both the CPU and the GPU.

We've seen UMA's quirks in action, with a user running a 400B LLM on an iPhone 17 Pro. Apple offers efficiency and speed, while Intel and AMD are competing on flexibility and affordability for AI workloads, especially with the advent of LPCAMM2.

Best Deals: Intel Arc Pro B50 Graphics Card
Today7 days ago30 days ago
$603.48 CAD-
--
--
--
* Prices last scanned 4/27/2026 at 4:16 pm CDT - prices may be inaccurate. As an Amazon Associate, we earn from qualifying purchases. We earn affiliate commission from any Newegg or PCCG sales.
News Source:intel.com

Tech Reporter

Email IconX IconLinkedIn Icon

Hassam is a veteran tech journalist and editor with over eight years of experience embedded in the consumer electronics industry. His obsession with hardware began with childhood experiments involving semiconductors, a curiosity that evolved into a career dedicated to deconstructing the complex silicon that powers our world. From benchmarking PC internals to stress-testing flagship CPUs and GPUs, Hassam specializes in translating high-level engineering into deep, unbiased insights for the enthusiast community.

Follow TweakTown on Google News
Newsletter Subscription