Hacker News new | ask | show | jobs
by dan-robertson 745 days ago
It’s not obvious that that’s what’s happened here. Eg vector scheduling is separated but there are more units for actually doing certain vector operations. It may be that lots of vector workloads are more limited by memory bandwidth than ILP so adding another port to the scheduler mightn’t add much. Being able to run other parts of the cpu faster when vectorised instructions aren’t being used could be worth a lot.
1 comments

That matches with recent material I've read on vectorized workloads: memory bandwidth can become the limiting factor.
Always nice to see people rediscovering the roofline model.