|
|
|
|
|
by intrasight
95 days ago
|
|
It's not so much a "minimally viable LLM" but rather an LLM that knows natural language well but knows nothing else. Like me - as an engineer who knows how to troubleshoot in general but doesn't know about a specific device like my furnace (recent example). And I don't think that LLM could just Google or check Wikipedia. But I do agree that this architecture makes a lot of sense. I assume it will become the norm to use such edge LLMs. |
|
While I understand some of the fundamental thoughts behind that comparison, it's slightly wonky... I'm not asking "compress wikipedia really well", but instead "can a 'model' reason its way through wikipedia" (and what does that reasoning look like?).
Theoretically with wikipedia-multi-lang you should be able to reasonably nail machine-translation, but if everyone is starting with "only wikipedia" then how well can they keep up with the wild-web-trained models on similar bar chart per task performance?
If your particular training technique (using only wikipedia) can go from 60% of SOTA to 80% of SOTA on "Explain why 6-degrees of Kevin Bacon is relevant for tensor operations" (which is interesting to plug into Google's AI => Dive Deeper...), then that's a clue that it's not just throwing piles of data at the problem, but instead getting closer to extracting the deeper meaning (and/or reasoning!) that the data enables.