Hacker News new | ask | show | jobs
by chimtim 3323 days ago
Intel needs to double or quadruple their processor's AVX width, add 8 bit instructions and make it slightly more power efficient (for AVX) and it will easily beat many of these TPUs. Of course this is easier said than done but I am really surprised they haven't done it so far and instead bought Nervana and other hardware chip startups.
3 comments

Have you read about the Google TPU? It's got 65536 ALUs running in parallel turning out a result every clock:

https://www.nextplatform.com/2017/04/05/first-depth-look-goo...

They claim faster memory (GDDR5?) could easily triple the performance which should bring it to 270TOPS or so. I don't think extending AVX is going to get there any time soon.

All this raw compute/parallelism is great but it does not really help the algorithm output in terms of efficiency. The CPU/GPU difference is much smaller than nvidia would like you to believe especially for more complex networks which are becoming more and more common now. Of course if you want to just do convolutions (which the TPU paper claims is only 5% of their google workload), building a hardware around it may work well for specific algorithms.
GPUs are already SIMD monsters, and even if you make them 8-bit then they won't be anywhere near what TPUs/DPUs can achieve because the TPUs/DPUs are in a specific arrangement for deep learning. Besides, there's tons of bloat in an OoO processor being used for deep learning...

In addition, people seem to disregard the fact that the biggest factor in power consumption is data movement, not the cost of computation, see here (slide 29 is the important one): https://www.ssken.gr.jp/MAINSITE/event/2013/20130827-sci-1/l...

None of the above is easy, as far I know AVX is one of the most thermally-constrained parts of the CPU. Xeon Phi is the best Intel managed to come with.