|
|
|
|
|
by anybodyz
1202 days ago
|
|
I tried Chomsky's counterfactual apple questions with Chat GPT. It was able to infer counterfactuals about apples or any sort of physical object, both forwards and backwards in time. So I assume Chomsky has not spent much time with ChatGPT or he would know that it was capable of making physical inferences and counterfactual inferences like that. Not perfectly (it did make a few mistakes) but fairly reliably. That ChatGPT regularly "makes stuff up" and that there is no difference between the truth or falsehood of any statement also seems to be false. I asked ChatGPT to act as a "[Lie Detector]" and to rate the truth or falsehood of a variety of statements asked. I asked about 40 questions ranging from physical situations (heavy objects floating away into air) temporal questions (time travel, etc.) and logical questions - and it very accurately could determine if each of these statements was "true" or "false" given physical or logical rules. Again not perfect but very accurate (38 out of 40 correct). With attention - ChatGPT is very obviously operating at a level above the simple probabilistic prediction. It clearly seems that it has some notion as to the meaning of what is being said and is making inferences based on that meaning. That those inferences were trained probabilistically is certainly true, but that it was trained on the average human's understanding of those physical or temporal or other constraints also seems to be true and to also be fairly accurate. |
|
1) One instance first parses the chat and last message to generate a response. Currently this is where things end but we can keep this private and do additional work.
2) A second instance, properly primed, can take the last prompt and response and "analyze" it, generating scores for things like factuality and usefulness, possibly adding commentary.
3) Pass into a third instance that has the chat history again to rewrite the response, taking into account the feedback.
4) Optionally repeat #2 and #3 until it passes some quality threshold.