Hacker News new | ask | show | jobs
by NovemberWhiskey 982 days ago
Yeah, that's still pretty much nonsense isn't it?

2b) If you draw all white balls in these next 50 draws you would have 50 red balls, 50 blue balls, and 50 white balls out. So, you would end with 50 white balls and 50 red balls in the container.

... so after removing 100 balls, I've removed 150 balls? And the 150 balls that I've removed are red, white and blue despite the fact that I removed 50 blue balls initially and then 50 white ones.

1 comments

Just because it fails one test in a particular way doesn’t mean it lacks reasoning entirely. It clearly does have reasoning based on all the benchmarks it passses

You are really trying to make it not have reasoning for your own benefit

> You are really trying to make it not have reasoning for your own benefit

This whole thread really seems like it's the other way around. It's still very easy to make ChatGPT to spit out obviously wrong answers depending on the prompt. If it had actual ability to reason as opposed to just generating continuation to your prompt, the quality of the prompt wouldn't matter as much

Then why does it do so well on all the reasoning benchmarks?