|
|
|
|
|
by sdenton4
58 days ago
|
|
I think the first two paragraphs of the post are exactly saying that the bottleneck is memory... Long contexts, bigger but less flop-intensive models (moe's). The funny thing about scaling laws is that as soon as they were known, the whole objective became learning how to break them - bending the curve, at least. They provided an incredibly useful target, but 'law' was a bit too strong a word. |
|