Y
Hacker News
new
|
ask
|
show
|
jobs
by
iamnotagenius
499 days ago
The performance comes mostly from a fraction of memory bandwidth needed, as LLM are mostly memory constrained. Compute matters too, but usually far less than memory.