I like this version much better because most people don't write books and AI is much better at writing than the average person, probably even a few standard deviations above the average.
Not to be too hippy/dippy, but it's only the average of all human knowledge expressible in language. There is plenty of knowledge not expressible this way - for example, the sequence of muscle contraction orchestrated by your brain that allows you to walk. Likewise with feelings like love, pride, etc. There are words for these things, but they're merely labels on an experience that almost all humans know but the specifics of which can't be written down using text.
Why does it need to be the average? It seems to me more like it models the manifold of human knowledge. However we often query for the average, because that is often good enough and gives us quick results, but there is nothing fundamentally preventing us from sending AI into the deep end of under-explored territory and perhaps coming back with something new. It is ultimately the exploration vs exploitation trade off.
> but there is nothing fundamentally preventing us from sending AI into the deep end of under-explored territory and perhaps coming back with something new
What's stopping us is that AI works by manipulating tokens and language but has no connection to reality as it exists. Einstein famously conceived of special relativity by imagining what it would be like to fly alongside a light wave[1]. This is a process that integrates spatial reasoning and imagination informed by living in the real universe where you can see objects moving or waves propagating in a pond. The language only comes later as a means of communicating these intuitions to others.
I wonder how much knowledge can be decoupled from experience, if at all.
If I read thousands of books that explain the details of another civilization in another galaxy, very thoroughly and consistently, but it it just happens to be all made up - did I gain knowledge? More importantly, does what I have in my brain now flip from being fiction to being knowledge if that civilization flipped from not existing to existing? How so, if nothing in my brain, or how I live out the rest of my life, changes in the least, if not a single atom in this galaxy changes (let's ignore that gravity has infinite reach and all that, for the sake of argument)?
If yes, how? What in your definition of knowledge makes that possible?
> If I read thousands of books that explain the details of another civilization in another galaxy, very thoroughly and consistently, but it it just happens to be all made up - did I gain knowledge?
sounds a lot like math - made up entities that very thoroughly and consistently fit together.
It's an interesting analogy you're making because... this is the lived reality of a lot of people that are interested in fictional worldbuilding / stories. And it flips to being real in the film Galaxy Quest.
Besides "secret" knowledge like the know-how at jobs, there's things like unwritten social etiquette (especially as it varies from place to place) or interfacing with physical world – reading about chopping tomatoes is different from experience acquired by actually chopping tomatoes.
It isn't. I constantly have access to non-public information, like the life of my peers and corporate secrets. Is it useful or essential or even desirable for LLM products? Hardly not, but it exists.
Edit: for "not in the training data" yes, humans generally can't know what they can't know.
That is not true. AI can synthesize information, which is the essence of intelligence. And since it knows more than everyone, it is more intelligent too. What they lack is the ability to create information.
Not quite. Large amounts of data going into these models has already been curated, otherwise you would get a tremendous amount of wrong answers for even the most basic questions.
It still produces wrong answers regardless though, not because of the training data but because of just... intrinsics. The question is what an acceptable error rate is, how severe those errors are, and whether a human would make comparable errors.
But this debate has parallels with self-driving cars; even if the numbers say that self-driving cars are not perfect but safer than human drivers, anything but perfection will be considered broken or outright illegal.