|
|
|
|
|
by sudoapps
1136 days ago
|
|
> But then I see model context length getting longer and longer just within the transformer architecture and the training engineering going on. Do you have any references to this? Seems really interesting if that can be a long term approach. |
|
More anecdotally, I couldn’t get anything to say more than a sentence locally at the beginning of 2023. I can get tons of useful results today.
Sure, this will plateau. But what if a model plateaus and it’s basically like a 10-year old?
But like, one of those 10-year-olds you hear about who gets his master’s degree at 13. At that point they’re just browsing the internet, reading books, and probably taking notes in a way that works for them.
Obviously this is wild speculation. Just laying out ideas that make me think in this direction.