|
|
|
|
|
by hodgehog11
189 days ago
|
|
There has always been pressure to do so, but there are fundamental bottlenecks in performance when it comes to model size. What I can think of is that there may be a push toward training for exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights. But this is likely to be much slower and come with initial performance costs that frontier model developers will not want to incur. |
|
That just gave me an idea! I wonder how useful (and for what) a model would be if it was trained using a two-phase approach:
1) Put the training data through an embedding model to create a giant vector index of the entire Internet.
2) Train a transformer LLM but instead only utilising its weights, it can also do lookups against the index.
Its like a MoE where one (or more) of the experts is a fuzzy google search.
The best thing is that adding up-to-date knowledge won’t require retraining the entire model!