I think the way to swallow this bitter pill is to acknowledge they can "generalize" because all human knowledge is actually a relatively "small" finite distribution that models are now big enough to pattern match on.
Calling human knowledge small is hyperbole. I cannot get any LLM even close to giving accurate answers related to the things I know. They simply do not know what I, a single human being, knows. That's simply because I'm a subject matter expert on somewhat niche topics. There are easily hundreds of thousands of people like me out there.
There's simply no way an LLM can even train on all of that because each bit of true expert knowledge necessarily comically underrepresented in any possible training set.
Maybe there's a way to reduce the dataset for a LLM to learn to reason down to the smallest possible set and then apply the vast knowledge of humankind on top of that?
I mean, if it can reason about and process the data as it ingests it?
There's simply no way an LLM can even train on all of that because each bit of true expert knowledge necessarily comically underrepresented in any possible training set.