Lightweight AI - NVIDIA releases Small Language Model with industry leading accuracy

Mistral-NeMo-Minitron 8B is a highly accurate and powerful small language model built off of NVIDIA and Mistral AI's NeMo 12B, optimized for real-time.

VIEW GALLERY - 2

Kosta Andreadis

Senior Editor

Published Aug 26, 2024 12:05 AM CDT

1 minute & 15 seconds read time

Voice: Kosta AndreadisSpeed

0:00 / --:--

Mistral-NeMo-Minitron 8B is a "miniaturized version" of the new highly accurate Mistral NeMo 12B AI model. It is tailor-made for GPU-accelerated data centers, the cloud, and high-end workstations with NVIDIA RTX hardware. Accuracy is often sacrificed to ensure performance regarding scalable AI models; Mistral AI and NVIDIA's new Mistral-NeMo-Minitron 8B deliver the best of both worlds.

Lightweight AI - NVIDIA releases Small Language Model with industry leading accuracy 2

VIEW GALLERY - 2 IMAGES

Small enough to run in real-time on a workstation or desktop rig with a high-end GeForce RTX 40 Series graphics card, with NVIDIA, noting that the 8B or 8 billion variant excels when it comes to benchmarks for AI chatbots, virtual assistant, content generation, and educational tools.

Available and packaged as an NVIDIA NIM microservice (downloadable via Hugging Face), Mistral-NeMo-Minitron 8B is currently outperforming Llama 3.1 8B and Gemma 7B in the all-important accuracy category in at least nine popular benchmarks for AI language models.

Read more: AMD's new Ryzen AI Max+ 395 'Strix Halo' APU is 3x faster in DeepSeek R1 AI bench than RTX 5080

"We combined two different AI optimization methods - pruning to shrink Mistral NeMo's 12 billion parameters into 8 billion, and distillation to improve accuracy," said Bryan Catanzaro, vice president of applied deep learning research at NVIDIA. "By doing so, Mistral-NeMo-Minitron 8B delivers comparable accuracy to the original model at lower computational cost."

Pruning and distillation for AI training involves downsizing the neural network by removing components that "contribute the least to accuracy" and retraining the pruned model via distillation. NVIDIA has also confirmed that it has an even "smaller" version called Nemotron-Mini-4B-Instruct, which is optimized for low memory and faster response times on NVIDIA GeForce RTX AI PCs and laptops.

For more information on Mistral-NeMo-Minitron 8B, check out NVIDIA's technical blog.