Hacker News new | ask | show | jobs
by phkahler 3326 days ago
Have you read about the Google TPU? It's got 65536 ALUs running in parallel turning out a result every clock:

https://www.nextplatform.com/2017/04/05/first-depth-look-goo...

They claim faster memory (GDDR5?) could easily triple the performance which should bring it to 270TOPS or so. I don't think extending AVX is going to get there any time soon.

1 comments

All this raw compute/parallelism is great but it does not really help the algorithm output in terms of efficiency. The CPU/GPU difference is much smaller than nvidia would like you to believe especially for more complex networks which are becoming more and more common now. Of course if you want to just do convolutions (which the TPU paper claims is only 5% of their google workload), building a hardware around it may work well for specific algorithms.