Y
Hacker News
new
|
ask
|
show
|
jobs
by
djsjajah
185 days ago
GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm. It’s the whole reason why low precision floating point numbers are being pushed by nvidia.
1 comments
ACCount37
184 days ago
That's memory bandwidth, not I/O. Unless your LLM doesn't fit into VRAM.
link