Hacker News new | ask | show | jobs
by chaxor 1142 days ago
There's a decent working paper that has benchmarks on this, if you're interested.

There are many types of reasoning, but GPT-4 gets 97% on casual discovery, and 92% on counterfactuals (only 6% off from human, btw) with 86% on actual causality benchmarks.

I'm not sure yet if the question is correct, or even appropriate/achievable to what many may want to ask (i.e. what 'the public's is interested in is typically lost after it is defined in any given study); however this is one of the best works available to address this problem I've seen so far, so perhaps it can help.

2 comments

Percent of what? Possible right or wrong answers to a test?

Remember that GPT is not trained on all possible text. It's trained on text that was written intentionally. What percentage of that text contains "correct" instances of causal discovery, counterfactuals, etc.?

so can we make an estimate of GPT-4's IQ?

EDIT: Seems so...

https://duckduckgo.com/?q=ESTIMATE+OF+GPT-4%27S+IQ&t=opera&i...

shows articles with GPT IQ from 114 to 130. Change is coming for humans.