| An ordered list of strings is the training corpus. That's the data being modeled. But that data is more specific than the set of all possible ordered lists of strings: it's a specific representation of an example game written as a chronology of piece positions. GPT models every pattern it can find in the ordered list of tokens. GPT's model doesn't only infer the original data structure (the list of tokens). That structure isn't the only pattern present in the original data. There are also repeated tokens, and their relative positions in the list: GPT models them all. When the story was written in the first place, the game rules were followed. In doing so, the authors of the story laid out an implicit boundary. That boundary is what GPT models, and it is implicitly a close match for the game rules. When we look objectively at what GPT modeled, we can see that part of that model is the same shape and structure as an Othello game board. We call it a valid instance of an Othello game board. We. Not GPT. We. People who know the symbolic meaning of "Othello game board" make that assertion. GPT does not do that. As far as GPT is concerned, it's only a model. And that model can be found in any valid example of an Othello game played. Even if it is implicit, it is there. |
The board structure can be defined precisely using predicate logic as (X, d), i.e., it is strictly below natural language and does not require a human interpretation.
And by "reduction" I meant the word in the technical sense: there exists subset of ChatGPT that encodes the information (X, d). This also does not require a human.