Hacker News new | ask | show | jobs
by iamnotagenius 499 days ago
The performance comes mostly from a fraction of memory bandwidth needed, as LLM are mostly memory constrained. Compute matters too, but usually far less than memory.