As an Amazon Associate, we earn from qualifying purchases. TweakTown may also earn commissions from other affiliate partners at no extra cost to you.
Chinese AI firm DeepSeek is cooking up its next-gen R2 AI model, which is said to be 97% cheaper to train than GPT-4, and it has been fully trained on Huawei AI GPUs.
In a new post on X by @deedydas has the hype train for DeepSeek R2 rocking and rolling, claiming that the new R2 model is going to adopt a hybrid MoE (Mixture of Experts) architecture, which is meant to be an advanced version of the existing MoE implementation, which should provide advanced gating mechanisms, or a combination of MoE + dense layers to optimize high-end AAI workloads.
DeepSeek R2 is set to double the parameters of R1, with 1.2 trillion parameters at the ready, and it's reportedly a whopping 97.3% cheaper to train than GPT 4o with the unit cost per token lower than 97.3% compared to GPT-4 at $0.07/M input token and 0.27/M output token. This means DeepSeek R2 is going to be uber-cheap for enterprise use, as it'll be the most cost-efficient AI model on the market.
Not only that, but DeepSeek R2 is said to achieve 82% utilization of Huawei's Ascend 910B AI chip cluster, with computing power measured at 512 PetaFLOPS of FP16 precision, showing that DeepSeek is using in-house resources for its new mainstream R2 AI model. Huawei AI chips were being trained on R2 with in-house equipment, as the Chinese firm had "vertically integrated" the AI supply chain into its model.
DeepSeek R2 "viral rumors":
- 1.2T param, 78B active, hybrid MoE
- 97.3% cheaper than GPT 4o ($0.07/M in, $0.27/M out)
- 5.2PB training data. 89.7% on C-Eval2.0
- Better vision. 92.4% on COCO
- 82% utilization in Huawei Ascend 910B