AMD’s Instinct MI200 GPU Uses Multi-Chip Design for Exascale Supercomputer


567
567 points


A recent Linux patch posted by AMD reveals that the company’s Instinct MI200 next-generation compute GPU, codenamed ‘Aldebaran,’ will use a multi-chip module (MCM) design. That means the GPU will come with two dies in a single chip package instead of the single die we’re accustomed to with standard GPUs. The accelerator is based on the CDNA 2 architecture and is set to be used for the Frontier exascale supercomputer due to be delivered this year.  

“On Aldebaran, only primary die fetches valid power data,” an AMD Linux patch reads. “Show power/energy values as 0 on secondary die. Also, power limit should not be set through secondary die.” 

READ More:  China's Ban On Cryptocurrency Mining Expands to Additional Provinces

AMD has a patent called ‘GPU Chiplets Using High-Bandwidth Crosslinks,’ as noted by Coelacanth-dream, so AMD has been working on its multi-chip compute GPU technology for some time. Meanwhile, according to the Linux patch, AMD’s MCM GPU technology requires one of the chiplets to become the primary and manage secondary chiplets, which helps the multi-chip GPU look and behave like one big processor to the host system.

(Image credit: AMD)

Making a multi-chip compute GPU is akin to making a multi-core MCM CPU, like the Ryzen 5000 or Threadripper processors. Firstly, bringing dies closer together increases compute efficiency. AMD’s Infinity architecture ensures a high-performance interconnection that promises to bring the efficiency of two dies close to one. Secondly, it is easier to mass-produce multiple small chips using an advanced process technology than big chips, as smaller chips usually have fewer defects, thus yielding better than larger chips.

READ More:  EA Hacked, 780 GB Worth of Data Stolen

(Image credit: AMD)

While multi-chip graphics subsystems have never been truly popular since many graphics workloads do not scale too well (and some do not scale at all), multiple compute GPUs per server are quite common since they scale well due to parallelized nature of supercomputing and datacenter workloads.

The devil is in the software details, applications have to be coded to extract the utmost performance from these types of architectures, but broad industry support for MCM seems to be coming to the fore.

(Image credit: AMD)

Intel’s Xe-HP and Xe-HPC GPUs also rely on MCM designs, so AMD is not alone with its MCM GPU plans. Furthermore, Nvidia’s upcoming Hopper compute GPUs are rumored to feature multiple dies, too. 

READ More:  Microsoft's Xbox E3 message: Massive variety, massive value, massive global potential

Source link


Like it? Share with your friends!

567
567 points