Hacker News new | ask | show | jobs
by anigbrowl 1245 days ago
How much of this is the language vs the vast amount of passably accurate domain knowledge? ChatGPT etc. seem magic because they can answer questions about virtually anything with a high degree of plausibility. It often gets specific facts wrong, but the general contours are correct. Many of us know a lot of trivia/specialist knowledge, but I don't think anyone is as broadly informed as ChatGPT appears to be. It's not clear where the language ends and the encyclopedic knowledge starts, but the latter must be taking up a very large amount of the space in the model.
2 comments

There have been attempts to separate fact knowledge from language knowledge - for example DeepMind RETRO that uses a search index of 1T tokens. RETRO manages to reach GPT-3 performance on some tasks with a 20x smaller model. I believe smaller model are more useful for extractive and classification tasks than creative text generation.
> How much of this is the language vs the vast amount of passably accurate domain knowledge?

LLMs don’t have domain knowledge, its all language.

That's what I meant by 'It's not clear where the language ends and the encyclopedic knowledge starts,' since the model (and perhaps our brains) make little distinction.

But the model seems to be storing an absolutely vast amount of information, beyond the the capability of any individual person to accumulate and recall. This is clearly not a prerequisite for language, even if the information is represented linguistically. Put another way, at age 20 I had read maybe 10-20% of what I've read since, but I was capable of reading comprehension and conversation even though my levels of knowledge and insight were much lower. By 'comprehension' I mean in the sense of being able to read a piece of text and answer questions about it or rewrite it, without necessarily having any priors about the topic; the kind of task we expect to be able to assign to a high school graduate.

I'm wondering what the size of an 'ignorant' language model is, as a precursor to more curated/directed training. While the state of the art is very impressive, it's a bit like taking a feast for a thousand people and rendering it into a giant cube of spam. This strategy seems guaranteed to produce a succession of increasingly capable idiots savant but limits other avenues of exploration.

> at age 20 I had read maybe 10-20% of what I've read since, but I was capable of reading comprehension and conversation...

This is because human intelligence is not just language, but lot of indirect context, "software" inside spinal cord (and other non-cortex parts of brain), and even human body itself.

But as I know, current LLMs working in plain flat structures. At the moment, nobody tried to use even neocortex-like structures, not even considered artificial spinal cord.

All these looks like, to teach table lamp, or something similar smart.