| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sdenton4 58 days ago
	I think the first two paragraphs of the post are exactly saying that the bottleneck is memory... Long contexts, bigger but less flop-intensive models (moe's). The funny thing about scaling laws is that as soon as they were known, the whole objective became learning how to break them - bending the curve, at least. They provided an incredibly useful target, but 'law' was a bit too strong a word.