NVIDIA's new China-specific B30 AI GPU has performance of around 75% of the H20 AI GPU, while demand for the new B30 is "significant" according to the latest reports.
In a new post on X by insider @Jukanrosleve, we're hearing from China's major internet companies that estimates that the performance of NVIDIA's new B30 AI GPU is "approximately 75% that of the H20". Chinese tech companies have reportedly placed orders for hundreds of thousands of units -- orders of over $1 billion -- in late-June, with deliveries expected in August.
Another large Chinese tech company reportedly plans to increase its Q3 2025 capital expenditure and intends to order 300,000 orders of NVIDIA's new B30 AI GPU, with a delivery schedule for September.
NVIDIA's new B30 AI GPU is expected to address two major pain points for China, which will see the new AI chip becoming the preferred solution for inference in small and medium-sized models, "fully aligning with the arrival of the inference era". Secondly, the B30 will act as a low-cost computing power pool for cloud services.
- Read more: NVIDIA B30 AI GPU won't be sold before September, China testing now
- Read more: NVIDIA's new B30 AI GPU allows multiple chips interconnected to act as one
- Read more: NVIDIA's new AI GPU for China rumored to be named B40, expected to use GDDR7
- Read more: China prepares for AI future without NVIDIA as AI GPU stockpile runs dry
- Read more: AMD preps China-specific AI chip: cut-down Radeon AI PRO R9700
- Read more: NVIDIA's next AI GPU for China will be Blackwell, says H20 can't be modified anymore
NVIDIA's upcoming China-specific B30 AI GPU has its own disadvantages in single-card energy efficiency, but that's mitigated in scenarios with low memory bandwidth requirements, including intelligent customer service, text generation, and image recognition.
For instance, when processing a 4096-long text, the H20's throughput reaches 961 tokens/s, while the new B30 is only capable of 60% of that, but when expanded into a larger 8-card AI GPU cluster, the new B30 can increase the effective bandwidth to 1.2TB/sec through dynamic compression technology, meeting medium concurrency demands.
The post from Jukan continues, saying that its deep compatibility with the CUDA-X ecosystem which allows enterprises to seamlessly migrate frameworks like PyTorch, which saves technical reconstruction costs.
Additionally, when it comes to the B30's role in a low-cost computing power pool for cloud services, NVIDIA's upcoming B30's cluster solution is highly cost-effective for small and medium-sized enterprises and academic institutions. In tests run by a major company, it shows that a computing power pool constructed with 100 of the new B30 AI GPUs can support lightweight training of billion-parameter models while reducing procurement costs by 40% and unit power consumption by close to 30% compared to H20.
Lastly, domestic-made AI chips from the likes of Huawei, might slightly pass the B30 in single-card FP16 computing power with around 200 TFLOPS of power, B30 maintains an advantage in mainstream model deployment efficiency due to the 'stickiness' of the CUDA ecosystem.




