Hacker News new | ask | show | jobs
by fiatmoney 3688 days ago
"Backprop" isn't even close to something that would be a "CPU instruction", it's an entire class of algorithm. It's like saying "calculus" should be a CPU instruction. Matrix multiplication & other operations, on the other hand, do neatly decompose into such instructions, which have been implemented by NVidia et al., since that's the core set of functionality they've been pushing for like a decade now.

Additional die space on additional functionality might hurt the power envelope (which is where the focus on performance / watt rather than performance kicks in) but it doesn't make your chips slower per se.

1 comments

That was my impression too. ML under the hood was a lot of linear algebra, not very different than most shaders. But maybe Google decided to hardcode a few important ML primitives because the ROI was that good in terms of grabbing customers. Also they might have very large scale applications not found elsewhere that motivates this.
Ok I was obviously oversimplifying things but my point is since we can only speculate, it's clear that when you know specific algorithms/math operations/memory layouts/applications you want to optimize for you can create dedicated chips that optimize and do that quickly. That bitcoin miners are all dedicated chips and run circles around GPUs demonstrates exactly this fact.

Furthermore the fact that ML can be error tolerant means you also get to optimize certain floating point operations for speed or energy efficiency at the cost of accuracy. NVIDIA doesn't get to do this in their linear algebra support.

Bitcoin mining is an extremely well-defined task compared to machine learning. It remains to be seen how general these TPUs are in practice - whether they will support the neural network architectures common two years from now.
tbh I felt like realizing what you meant earlier at the end of my comment. I should have ps'd it.