Hacker News new | ask | show | jobs
by rodoxcasta 1055 days ago
For inference, at least locally, the bottleneck is usually the memory bandwidth (and quantity, of course).

I hope that AI hype lead us to more memory and more memory bandwidth, because they are really lagging behind computer power increase from like 15 years already.

1 comments

Oh, 100%. But you can do some pretty amazing things with fine-tuning LLMs too, and that is very compute intensive. Not to mention it's ridiculously hard even getting access to a cloud GPU instance nowadays.