Hacker News new | ask | show | jobs
by imtringued 1199 days ago
Let us construct IceCreamGPT. We take a corpus of text written by people who like ice cream and have provably demonstrated their joy while eating it. We then fine tune GPT 3.5 and the resulting model is called IceCreamGPT. Does IceCreamGPT like ice cream or is it only seemingly liking ice cream? It obviously likes ice cream, since it shares the same intentionality as humans responsible for the training data.

Now do the same with people who don't like ice cream but lie and write that they like ice cream. The performance of the second model is identical to the first model. Does this mean IceCreamGPT2 likes ice cream? Of course not, IceCreamGPT2 doesn't like ice cream despite it saying it likes ice cream! We know it doesn't like ice cream because it has the same intentionality as the humans responsible for its training data.

Now we have entered a magic world in which anything can mean anything.

1 comments

No, this is just question-begging by treating GPT's access to the world as being external to it, but your own as being part of you.

If we fix this by treating your senses as external, then we can imagine a copy of you with its senses rewired so that artichokes* taste like icecream (and vice-versa). (plus we lie to you about which is which.) The resulting imtringued2 is identical to you, but doesn't like ice cream despite it saying it likes ice cream. Just like IceCreamGPT2.

* Or some equally disgusting "food".