Hacker News new | ask | show | jobs
by cgearhart 1177 days ago
It’s a fallacy to describe what the machine does as “thinking” because that’s only process you know for achieving the same outcome.

When you initiate the model with some input where you expect some particular correct output, that means there exists some completed sequence of tokens that is correct—if that weren’t true then you either wouldn’t ask or else you wouldn’t blame the model for being wrong. Now imagine a machine that takes in your input and in one step produces the entire output of that correct answer. In all nontrivial cases there are many more _incorrect_ possible outputs than correct ones, so this appears to be a difficult task. But would you say such a machine is “thinking”? Would you still consider it thinking if we could describe the process mathematically as drawing a sample from the output space; that it draws the correct sample implies it has an accurate probability model of the output space conditioned on your input. Does this require “thought”?

GPT is just like this machine except that instead of one-step, the inference process is autoregressive so each token comes out one at a time instead of all at once. (Note that BERT-style transformers _do_ spit out the whole answer at once.)

It’s possible that this is all that humans do. Perhaps we are mistaken about “thinking” altogether—perhaps the machine thinks (like a human), or perhaps humans do not think (like the machine). In either case I do feel confident that human and machine are not applying the same mechanism; jury is still out whether we’re applying the same process.

1 comments

Now consider the case when you tell GPT to "think it out loud" before giving you the answer - which, coincidentally, is a well-known trick that tends to significantly improve its ability to produce good results. Is that thinking?
Maybe. Mechanically we might also describe it as causing the model to condition more explicitly on specific tokens derived from the training data rather than the implicit conditioning happening in the raw model parameters. This would tend to more tightly constrain the output space—making a smaller haystack to look for a needle. And leveraging the fact that “next token prediction” implies some consistency with preceding tokens.

It could be thinking, but I don’t think that’s strong evidence that it is thinking.

I would say that it's very strong evidence that it is thinking, if that "thinking out loud" output affects outputs in ways that are consistent with logical reasoning based on the former. Which is easy to test by editing the outputs before they're submitted back to the model to see how it changes its behavior.