Birentech has rolled out some new details on its new Biren BR100 GPU, with the Chinese company revealing more on its powerful new GPU at Hot Chips 34.
The new Birentech BR100 GPU uses an in-house GPU architecture, baked onto the 7nm process node and packing 77 billion transistors in total. Quite the hefty monolithic GPU from Birentech, competing directly against AMD and NVIDIA.
The BR100 GPU inside is fabricated on TSMC's 2.5D CoWoW design with 300MB of on-chip cache, with 64GB of HBM2e memory with 2.3TB/sec of memory bandwidth on the PCIe 5.0 x16 (CXL interconnect protocol). Birentech is using two chiplets on its BR100 GPU, with each of the GPU chiplets packing 16 SPCs (Streaming Processing Clusters). BR100 has a 550W TDP, and comes in an OAM module.
Each of the SPCs has 16 EUs with 4 of the EUs making up an internal Compute Unit (CU) that has 64KB of L1 cache (LSC) with the SPC sharing 8MB of L2 cache through all of the Execution Units (EUs). In total, BR100 has 32 SPCs with 512 EUs, 256MB of L2 cache, and 8MB of L1 cache.
During Hot Chips 34, Birentech showed off the BR100 architecture diagram where we can see that the Execution Unit has 16 streaming processing cores (V-Core) and a single Tensor Engine (T-Core)... Tensor Engine... ring any bells? Birentech has 40KB of TLR (Thread Local Register), 4 SFUs, and a TDA (Tensor Data Accelerator).
Each of the CUs are capable of housing 4, 8, or up to 16 EUs in total. Birentech's new BR100 and its V-Core is a general-purpose SIMT processor with 16 cores that supports FP32, FP16, INT32, and INT16. There's also support for SFU, Load/Store, and Data Processing, with deep learning operations performed in the form of Batch Norm, ReLu, and more.
BR100 packs an enhanced SIMT model that is capable of running up to 128K threads on 32 SPs in a super-scalr mode (static and dynamic). The T-Cores are designed to boost AI-powered tasks like MMA, Convulution, and more.
Birentech compares its new BR100 GPU against NVIDIA's current-gen Ampere A100 (the company just detailed its new Hopper H100 GPU which is a monster). In these benchmarks, the BR100 will smash the NVIDIA Ampere A100 GPU (without independent testing). BR100 beats A100 in multiple HPC workloads, to the tune of 2.5x the performance of A100.
- Read more: Chinese GPU maker: Biren BR100 has 77 billion transistors, 64GB HBM2e
- Read more: Chinese GPU makers should have 7nm, 5nm GPUs on shelves in 2022
- Read more: Moore Threads MTT S60, MTS2000 GPU: first China domestic GPU arch
- Read more: Jing Jiawei JM9: Chinese GPU maker keeps up with GeForce GTX 1080
- Read more: Innosilicon Type-B graphics card: Chinese GPU has 32GB GDDR6X memory
NVIDIA might have unveiled its new Hopper H100 GPU, but it's not like Birentech's new BR100 isn't impressive. NVIDIA has 80 billion transistors forming its new Hopper H100, Birentech has 77 billion transistors inside BR100. BR100 is built on 7nm, H100 is built on 4nm (enhanced 5nm, optimized by TSMC for NIVDIA and H100).