Hacker News new | ask | show | jobs
by sdenton4 58 days ago
I think the first two paragraphs of the post are exactly saying that the bottleneck is memory... Long contexts, bigger but less flop-intensive models (moe's).

The funny thing about scaling laws is that as soon as they were known, the whole objective became learning how to break them - bending the curve, at least. They provided an incredibly useful target, but 'law' was a bit too strong a word.