You can only get an 8 in the rightmost digit of the result by multiples of the rightmost digits, but 08 obviously gets you a 0, so fairly easy to see this is wrong.
The 8 solutions I got while clicking on regenerate:
3.33333333333
42, so the point your talking about is 3.3 (Accuracy is
3 Additionally, 3 coincided with John 3:16 , "$3
1
3.33333333333
42
42+1=3+1=4=42+1=43
2×5
Not so sure what I just did.
Results are copy-pasted as-is
I think they might be making a joke about how JavaScript can act surprisingly when `+` operator is used with strings/arrays in combination with numbers
It is using pre hype old version of GPT. So it is quite dishonest that you would have to use this one to prove a point. It may work as a joke, but the model that the hype is for (GPT4) wouldn't perform that poorly.
So it is actually evidence in favour of how strong the gap is between pre hype and after.
This is the first time I have come across Calvin Liang, but I’m already a big fan. Their artist’s statement manages to be very funny while making a point. I like today.
I'm sorry but this falls flat for me. GPT4 routinely can answer impressive math questions for me (college-level):
- What diameter steel wire would I need to be rated for a weight of 500lbs?
- How many digits would a ID need to be (using 36 characters) to have a 1/10^20 chance of collision over 1 billion random IDs?
- If I have a list of a million times (say durations of a web request) and they follow a normal distribution, and I take a sample of 1 million of those, how close would the average of my .1% sample be to the true average of the billion?
- Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?
It is for sure just a funny hobby project, but your statement had me intrigued:
> Suppose in D&D I am told to roll 20 d6, but instead of rolling that many dice I want to roll just two (larger) dice and add a constant. Which standard D&D dice might give the closest variance and what is the constant?
Interestingly, ChatGPT 4o tells me to use 2d19 + 51, even after correcting it and asking for larger dice. Impressive math for sure but not worth much if it doesn't respect constraints. I guess I could try again until it stumbles upon the right answer, but it's all to say it's not quite there yet.
When you ask ChatGPT straightfoward math questions it usually solves them by generating and running a Python script which performs the actual calculation on its behalf. That's a reasonable workaround on paper since LLMs are better at doing codegen than math, but the heuristics which trigger that path are brittle so if your query is more complicated it can regress to making the LLM itself try to solve it, with predictably poor results. You can tell when the Python solver is used because there will be a button next to the response which shows you the code it ran.
Wolfram Alpha:
Q: (706458021-342622146)*988457934
A: 359,636,457,317,582,250
ChatGPT:
Q: What is the solution to (706458021-342622146)*988457934
A: The solution to (706458021 - 342622146) x 988457934 is 359,636,457,317,582,250 [Python button]
ChatGPT:
Q: Write a press release announcing a momentous scientific discovery: the solution to (706458021-342622146)*988457934
A: [...] The equation, which involves the subtraction of two large integers followed by multiplication with another large integer, has been resolved to yield a precise result of 359,462,296,091,341,640. The computation was executed with utmost precision, demonstrating the profound capabilities of modern mathematical techniques and computational power. [...] [no Python button]
This is neat, but most people are going to miss "GPT-3 (babbage-002)". Using a rudimentary, outdated model seems disingenuous when making any kind of point about AI.
Yeah I would say it actually makes the contrary point. That pre hype version of the GPT is poor and if you have to use this one to prove a point it probably means there is a huge jump between GPT3 and GPT4. So to me it proves the contrary. And anybody going for that or believing it doesn't actually understand the performance of GPT4 or better if they are thinking that this is post hype LLM output.
Well, what if it just got better at covering up human-presentable cases?
See this comment [0] on this very post, showing how it makes quite problematic mistakes on larger numbers still.
It's still improvement, but only in the way of imitation. It shows that while clever within their constraints, these models still don't have the capabilities to truly perform computation or "thought". Chain of thought can help, but you there are some things you cannot split into atomic tasks; if the very world model isn't that stellar, no amount of elucidation will compensate for the inaccurate representations within. (i.e. "How would person X react to Y?" If your theory of mind is poor, no amount of further subtasks will help you give a better prediction.)
For larger numbers it just needs to execute code. Most people also can't calculate such numbers in their head.
It shouldn't have to be able to do things it knows how to use code for. E.g. dumb thing slike how many Rs in a strawberry. It doesn't even see characters, so even if it was somehow possible, it couldn't count for sure.
It is like asking someone who only has ever seen hieroglyphs how many Rs are in a character by character version of strawberry.
Still, let's not anthropomorphize computational processes. It is a function approximate, which we'd expect to pick up on simple patterns like intersections or base10 arithmetic. When we see its predictions diverge from truth, that shouldn't be disregarded with a "just so" story, this is a sign we're pushing the architecture to its limits.
Not really; there is some asymmetry. One could at least hope (as many seemingly have) that natural language systems like LLMs could also cope with formal reasoning and calculation, but you’d be an idiot to think it goes the other way.
Seriously though, this is wonderful satire. I asked 88x10 and it returned an HTML meta tag.