Hacker News new | ask | show | jobs
by tptacek 58 days ago
I responded to your point empirically, with problems not conventionally understood to be solvable with "text generation", and your response was in effect that I must be wrong because I'm afraid you might be right. Not an especially strong debate move.

Can you refute the argument I made, or do you just want to claim LLMs are drinking all our water?

1 comments

Well, I don't believe the LLM solved those problems. I believe the user did. The LLM aggregated large amounts of information statistically, then the user read that and realized there was something to it and fixed it. Those accounts don't mention the 1000 other prompts that technical user did that yielded garbage results and the user was intelligent enough to disregard those.
No, that's false, in every example I gave. But I appreciate you making clearer that I correctly ascertained your original claim, that you believe they literally are just random text generators, and that people are simply cherry picking the rare meaningful text out of them.

That's what I thought you meant by "statistical text generator", and is why I was moved to comment.

1) I never said random 2) I never said cherry picking RARE meaningful text 3) It is not false in every example you gave just because you say that it is 4) If I didn't know better, I might think you're confused about what statistical means (hint: it's not random)
No, it's false in each example because I'm either a first or secondhand party to it happening (except for the Erdos thing) and I know it's false.

You managed to include in your blanket and conclusory rebuttal "solving undergrad math problems instantaneously". That was one of my examples because (1) it pertains to the subthread, (2) I was talking about it upthread, and (3) I have direct firsthand knowledge.

As I said elsewhere: I've fed thousands of math problems through ChatGPT (starting with 4o and now with 5.5). They've all been randomized. They do not appear in textbooks. They cover all the ground from late high school trig to university calc III. I do this habitually, every time I work an "interesting" problem, to get critiques on my own work. GPT has been flawless, routinely spotting errors or missed opportunities. If I have any complaint, it's that GPT tends to be too much better than I am at any given point, using concepts from later courses to solve simpler problems.

Square that with the claim you're making.

I can do the same thing with vulnerability research (I've been a vuln researcher since 1996 and I use LLMs to find vulnerabilities). But this thread is about math, and it's even easier to show you're wrong in the context of math.

That's convenient. But I have a challenge for you if you're brave enough to face your delusions. Paste this into your LLM of choice and see what happens:

"A farmer has 17 sheep. 9 ran away. He then bought enough to double what he had. His neighbor, who had 4 dogs and 14 sheep, gave him one-third of her animals. The farmer sold 5 sheep on Monday and again the next day, which was Wednesday. Each sheep weighs about 150 lbs. How many sheep does the farmer have?"

17 sheep - 9 ran away = 8 sheep

He bought enough to double what he had: 8 more sheep, so 16 sheep

Neighbor has 4 dogs + 14 sheep = 18 animals

One-third of her animals = 6 animals

But the problem does not say all 6 were sheep. It says “animals.” So the exact sheep count depends on which animals she gave him.

Then:

16 + s sheep from neighbor - 5 - 5 = 6+s

where s is the number of sheep among the 6 animals she gave him.

So the answer is not uniquely determined.

Possible sheep count: 6 to 12 sheep, depending on whether the neighbor gave him 0 to 6 sheep.

(I clipped the GPT5 answer here, but will note additionally that even the LLM built into the Google search results page handles this question; both note the possible trick question with the days of the week.)