Hacker News new | ask | show | jobs
by djsjajah 185 days ago
GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm. It’s the whole reason why low precision floating point numbers are being pushed by nvidia.
1 comments

That's memory bandwidth, not I/O. Unless your LLM doesn't fit into VRAM.