It can reason better than most humans put into the same situation.
This problem doesn't result in a constant value, it results in a 3D probability distribution! Very, very few humans could work that out without tools. (I'm including pencil and paper in "tools" here.)
With only a tiny bit of coaxing, GPT 4 produced an animated video of the solution!
Try to guess what fraction of the general population could do that at all. Also try to estimate what fraction of general software developers could solve it in under an hour.
A human could get a valid end state most of the time, gpt-4 seems to mess up more than it got it right based on the examples posted here. So to me it seems like gpt-4 is worse than humans.
Gpt-4 with help from a competent human will of course do better than most humans, but that isn't what we are discussing.
I disagree. Don't assume "most humans" are anything like Silicon Valley startup developers. Most developers out there in the wild would definitely struggle to solve problems like this.
For example, a common criticism of AI-generated code is the risk of introducing vulnerabilities.
I just sat in a meeting for an hour, literally begging several developers to stop writing code vulnerable to SQL injection! They just couldn't understand what I was even talking about. They kept trying to use various ineffective hacky workarounds ("silver bullets") because they just didn't grok the the problem.
>It can reason better than most humans put into the same situation.
On what basis do you allege this? People say the most unhinged stuff here about AI, and it so often goes completely unchallenged. This is a huge assertion that you are making.
The equivalent of what current-gen LLMs do is an oral examination. Picture standing in the middle of a room surrounded by subject matter experts grilling you for your knowledge of various random topics. You have no tools, no calculator, no pencil and paper.
You’re asked a question and you just have to spit out the answer. No option to backtrack, experiment, or self correct.
“Translate this to Hebrew”.
“Is this a valid criticism of this passage from a Platonic perspective?”
“Explain counterfactual determinism in Quantum Mechanics.”
“What is the cube root of 74732?”
You would fail all of these. The AI gets 3 of 4 correct.
Tell me who’s smarter?
You because of your preconceptions, or because of real superiority?
This problem doesn't result in a constant value, it results in a 3D probability distribution! Very, very few humans could work that out without tools. (I'm including pencil and paper in "tools" here.)
With only a tiny bit of coaxing, GPT 4 produced an animated video of the solution!
Try to guess what fraction of the general population could do that at all. Also try to estimate what fraction of general software developers could solve it in under an hour.