Its trivial to get ChatGPT to reason about things:
Me: A room contains a blue box and a red box. Alice stands on the red box. Bob also stands on a box. No box has multiple people standing on it. Which box is Bob stood on?
ChatGPT: There are two boxes in the room: a blue box and a red box. Alice is standing on the red box, so Bob must be standing on the blue box. Therefore, Bob is standing on the blue box.
Me: Add a green box to the room. Which box is bob standing on?
ChatGPT: If a green box is added to the room, then there are now three boxes: a blue box, a red box, and a green box. Since Alice is still standing on the red box, we know that Bob must be standing on either the blue box or the green box. We cannot determine which one Bob is standing on without additional information.
There's a decent working paper that has benchmarks on this, if you're interested.
There are many types of reasoning, but GPT-4 gets 97% on casual discovery, and 92% on counterfactuals (only 6% off from human, btw) with 86% on actual causality benchmarks.
I'm not sure yet if the question is correct, or even appropriate/achievable to what many may want to ask (i.e. what 'the public's is interested in is typically lost after it is defined in any given study); however this is one of the best works available to address this problem I've seen so far, so perhaps it can help.
Percent of what? Possible right or wrong answers to a test?
Remember that GPT is not trained on all possible text. It's trained on text that was written intentionally. What percentage of that text contains "correct" instances of causal discovery, counterfactuals, etc.?
It doesn't matter what it is designed for. What matters is what it actually does. It's trivial to get GPT to do chain-of-thought reasoning and observe it.
Me: A room contains a blue box and a red box. Alice stands on the red box. Bob also stands on a box. No box has multiple people standing on it. Which box is Bob stood on?
ChatGPT: There are two boxes in the room: a blue box and a red box. Alice is standing on the red box, so Bob must be standing on the blue box. Therefore, Bob is standing on the blue box.
Me: Add a green box to the room. Which box is bob standing on?
ChatGPT: If a green box is added to the room, then there are now three boxes: a blue box, a red box, and a green box. Since Alice is still standing on the red box, we know that Bob must be standing on either the blue box or the green box. We cannot determine which one Bob is standing on without additional information.