|
|
|
|
|
by mjburgess
1099 days ago
|
|
There are an infinite number of sentences which describe what "being red" is, most of them have never been written. Redness is not in the structure of those sentences. And there will always be an infinity of sentences which are True but cannot be infered by an LLM -- but can be so, trivially, by a person acquainted with redness. In any case, I'd need more time than I have at the moment to seriously state Hyp1 for your case -- but atm, I can say that because the data itself has the property, Hyp1 becomes much harder to state and the argument much subtler. Since what is a "statistical distribution" of "ordinals" anyway? And how much memory is required to represent it? My sense is this distribution has highly redundant features which will be trivially compressible without learning any "sorting algorithm". At a quick glance of your article it feels like you havent formulated Hyp1 correctly -- P(CorrectSort | f(HistoricalCases)) is perhaps arbitrarily high if some statistical f() can be chosen well. |
|
Which is exactly how the set of sentences actually written encodes in it the idea of "Redness". It's the "actually written" part that carries information about the real world.
> And there will always be an infinity of sentences which are True but cannot be infered by an LLM -- but can be so, trivially, by a person acquainted with redness.
That's cheating, because "a person acquainted with redness" presumably learned it by sight, which LLMs can't do just yet (at least the widely accessible ones can't). Would you also say that a person born blind also cannot infer those True sentences about redness? Because if they can, that means the concept of redness is capable of being taught through language, and so there's no reason LLMs couldn't pick up on it too.