Hacker News new | ask | show | jobs
by migueldeicaza 264 days ago
The CUDA moat is real for general purpose computing and for researchers that want a swiss army knife, but when it comes to well known deployments, for either training or inference, the amount of stuff that you need from a chip is quite limited.

You do not need most of CUDA, or most of the GPU functionality, so dedicated chips make sense. It was great to see this theory put to the test in the original llama.cpp stack which showed just what you needed, the tiny llama.c that really shows how little was actually needed and more recently how a small team of engineers at Apple put together MLX.

1 comments

Absolutely agreed on the need for just specific parts of the chip and tailoring to that. My point is bigger than that. Even if you build a specific chip, you still need engineers who understand the full picture.