DeepSeek's next-gen R2 AI model rumors: 97% lower costs than GPT-4, trained on Huawei AI chips

Chinese AI giant DeepSeek's new R2 AI model teased: 97% lower costs than GPT-4 with the new AI model fully trained on Huawei AI GPUs.

DeepSeek's next-gen R2 AI model rumors: 97% lower costs than GPT-4, trained on Huawei AI chips
Comment IconFacebook IconX IconReddit Icon
Gaming Editor
Published
1 minutes & 15 seconds read time

As an Amazon Associate, we earn from qualifying purchases. TweakTown may also earn commissions from other affiliate partners at no extra cost to you.

TL;DR: Chinese AI firm DeepSeek is developing its next-gen R2 model with 1.2 trillion parameters, using a hybrid MoE architecture for optimized AI workloads. Trained on Huawei Ascend 910B GPUs, R2 is 97% cheaper to train than GPT-4, offering cost-efficient, high-performance AI for enterprise applications.

Chinese AI firm DeepSeek is cooking up its next-gen R2 AI model, which is said to be 97% cheaper to train than GPT-4, and it has been fully trained on Huawei AI GPUs.

In a new post on X by @deedydas has the hype train for DeepSeek R2 rocking and rolling, claiming that the new R2 model is going to adopt a hybrid MoE (Mixture of Experts) architecture, which is meant to be an advanced version of the existing MoE implementation, which should provide advanced gating mechanisms, or a combination of MoE + dense layers to optimize high-end AAI workloads.

DeepSeek R2 is set to double the parameters of R1, with 1.2 trillion parameters at the ready, and it's reportedly a whopping 97.3% cheaper to train than GPT 4o with the unit cost per token lower than 97.3% compared to GPT-4 at $0.07/M input token and 0.27/M output token. This means DeepSeek R2 is going to be uber-cheap for enterprise use, as it'll be the most cost-efficient AI model on the market.

Not only that, but DeepSeek R2 is said to achieve 82% utilization of Huawei's Ascend 910B AI chip cluster, with computing power measured at 512 PetaFLOPS of FP16 precision, showing that DeepSeek is using in-house resources for its new mainstream R2 AI model. Huawei AI chips were being trained on R2 with in-house equipment, as the Chinese firm had "vertically integrated" the AI supply chain into its model.

DeepSeek R2 "viral rumors":

  • 1.2T param, 78B active, hybrid MoE
  • 97.3% cheaper than GPT 4o ($0.07/M in, $0.27/M out)
  • 5.2PB training data. 89.7% on C-Eval2.0
  • Better vision. 92.4% on COCO
  • 82% utilization in Huawei Ascend 910B