[My first attempt at an ELI5…] This is needed for massive ML models. GPT-3 for example has 175 billion parameters. The model training job simply won’t fit on a single processor’s memory by itself. To calculate these models, the bigger job has to be broken up into many smaller jobs, across different processors. But, to complete the big job, all of those smaller jobs need to talk to one another from time to time. Massive bandwidth speeds up that crosstalk so that the training can happen faster.