Hacker News new | ask | show | jobs
by glitchc 640 days ago
Indeed, you can strip out a whole host of things from the GPU, the framebuffer, the Z-buffer, the transform and lighting engine, instead filling it with more CUDA cores and a higher bandwidth memory controller with a larger bus, etc.

And, as it happens, that's exactly what NVidia's done with the H100: https://developer.nvidia.com/blog/nvidia-hopper-architecture...

It still needs to be programmable though. Can't get away from that.

2 comments

You can get away from that if you constrain it to a specific type of models (say attention based).
You don’t need general programmability for AI inference.
The money's in the training, not the inference.

If you look at Apple and Google, they already have their own hardware for inference in their smartphones. They don't need NVidia for that.

Hmmm, that's worse for NVidia.
NVIDIA owns the interconnects that are used for this training. I’m sure they have their own competing AI accelerator they are working on too.
You don’t need programmability for AI teaining either.