Hacker News new | ask | show | jobs
by rbanffy 2172 days ago
I for one would be delighted by having more caches or wider backends instead of AVX512, but I don't want SIMD to be pushed into GPUs. It'd be better to do the reverse - to push forward the asymmetric core idea and move more GPU functionality into lots of simpler cores tuned for SIMD at the cost of single thread performance.
3 comments

Here are some shots of the Mask Registers https://travisdowns.github.io/blog/2020/05/26/kreg2.html#the...

If seems like they just keep that area mostly empty in processors without that feature, at least for the processors related to the one pictured. Not really sure how much cache that would be effective could fit without a major overhaul, but likely a chip designer or enthusiast would. This could be why Linus focused on computational enhancement when he discussed transistor budget.

From a quick glance at the proportions and considering not only the register files are halved, but also the vector EUs, I'd expect a 25% increase in L3 or a 50% in L2. That and some lessened thermal constraints.
I really don't know if that would help much. Better cache management might give more bonus than just bigger caches or higher bandwidth.
It depends on your workload, but if you are wasting too much time with L3 misses, more cache (and more memory channels) is a good idea.
So, Larrabee?
Cell with a saner bus/memory access?
If all cores see a single unified and consistent memory image (some scratchpad memory excepted), it's best if they all share the same basic ISA (and not implemented instructions trap to process migration or software emulation)
Or Fujitsu A64FX?