|
|
|
|
|
by chaxor
1142 days ago
|
|
There's a decent working paper that has benchmarks on this, if you're interested. There are many types of reasoning, but GPT-4 gets 97% on casual discovery, and 92% on counterfactuals (only 6% off from human, btw) with 86% on actual causality benchmarks. I'm not sure yet if the question is correct, or even appropriate/achievable to what many may want to ask (i.e. what 'the public's is interested in is typically lost after it is defined in any given study); however this is one of the best works available to address this problem I've seen so far, so perhaps it can help. |
|
Remember that GPT is not trained on all possible text. It's trained on text that was written intentionally. What percentage of that text contains "correct" instances of causal discovery, counterfactuals, etc.?