Hacker News new | ask | show | jobs
by b33j0r 1136 days ago
Yep. To be clear, that’s the exact approach I’ve been pursuing.

But then I see model context length getting longer and longer just within the transformer architecture and the training engineering going on.

To me that’s a fundamentally different approach to AI research at this moment. It seems to keep paying off in surprising ways.

1 comments

> But then I see model context length getting longer and longer just within the transformer architecture and the training engineering going on.

Do you have any references to this? Seems really interesting if that can be a long term approach.

I’m considering the recent 64k token models as the most relevant examples.

More anecdotally, I couldn’t get anything to say more than a sentence locally at the beginning of 2023. I can get tons of useful results today.

Sure, this will plateau. But what if a model plateaus and it’s basically like a 10-year old?

But like, one of those 10-year-olds you hear about who gets his master’s degree at 13. At that point they’re just browsing the internet, reading books, and probably taking notes in a way that works for them.

Obviously this is wild speculation. Just laying out ideas that make me think in this direction.

OpenAI is rolling out access to a 32k context model. Mosaic ML just released a model trained on 65k inputs. https://www.mosaicml.com/blog/mpt-7b