NVIDIA AI GPUs trained Meta's new Llama 3 model for the cloud, edge, and RTX PCs

NVIDIA announces optimizations across all of its platforms to accelerate Meta's new Llama 3, the latest LLM, with NVIDIA tech boosting the generative AI.

1 minute & 46 seconds read time

NVIDIA has just announced optimizations across all of its platforms to accelerate Meta Llama 3, Meta's latest-generation large language model (LLM).

NVIDIA AI GPUs trained Meta's new Llama 3 model for the cloud, edge, and RTX PCs 04

The new Llama 3 model combined with NVIDIA accelerated computing provides developers, researchers, and businesses with innovation across various applications. Meta engineers trained their new Llama 3 on a computing cluster featuring 24,576 NVIDIA H100 AI GPUs linked through the NVIDIA Quantum-2 InfiniBand network; with support from NVIDIA, Meta tuned its network, software, and model architectures for its flagship Llama 3 LLM.

To further advance the state-of-the-art generative AI, Meta recently described plans to scale its AI GPU infrastructure to an astonishing 350,000 NVIDIA H100 AI GPUs. That's a lot of AI computing power, a ton of silicon, probably a city's worth of power, and an incredible sum of money on AI GPUs ordered by Meta from NVIDIA.

NVIDIA has said that versions of Meta's new Llama 3, accelerated on NVIDIA AI GPUs, are now available for use in the cloud, data center, edge, and PC. From your own browser, you can test Llama 3 right here, packaged as an NVIDIA NIM microserver with a standard application programming interface that can be deployed anywhere.

NVIDIA explains on its website: "Best practices in deploying an LLM for a chatbot involves a balance of low latency, good reading speed and optimal GPU use to reduce costs. Such a service needs to deliver tokens - the rough equivalent of words to an LLM - at about twice a user's reading speed which is about 10 tokens/second. Applying these metrics, a single NVIDIA H200 Tensor Core GPU generated about 3,000 tokens/second - enough to serve about 300 simultaneous users - in an initial test using the version of Llama 3 with 70 billion parameters. That means a single NVIDIA HGX server with eight H200 GPUs could deliver 24,000 tokens/second, further optimizing costs by supporting more than 2,400 users at the same time".

Buy at Amazon

NVIDIA H100 80 GB Graphic Card PCIe HBM2e Memory 350W (NVIDIA H100 80 GB)

TodayYesterday7 days ago30 days ago
Buy at Newegg
* Prices last scanned on 7/11/2024 at 8:04 pm CDT - prices may not be accurate, click links above for the latest price. We may earn an affiliate commission.
NEWS SOURCE:wccftech.com

Anthony joined the TweakTown team in 2010 and has since reviewed 100s of graphics cards. Anthony is a long time PC enthusiast with a passion of hate for games built around consoles. FPS gaming since the pre-Quake days, where you were insulted if you used a mouse to aim, he has been addicted to gaming and hardware ever since. Working in IT retail for 10 years gave him great experience with custom-built PCs. His addiction to GPU tech is unwavering and has recently taken a keen interest in artificial intelligence (AI) hardware.

Newsletter Subscription

Related Tags