Hacker News new | ask | show | jobs
by SJC_Hacker 436 days ago
Thats probably the case 99% of the time.

But that 1% is pretty important.

For example, they are dismal at math problems that aren't just slight variations of problems they've seen before.

Here's one by blackandredpenn where ChatGPT insisted the solution to problem that could be solved by high school / talented middle school students was correct, even after trying to convince it it was wrong. https://youtu.be/V0jhP7giYVY?si=sDE2a4w7WpNwp6zU&t=837

Rewind earlier to see the real answer

2 comments

> For example, they are dismal at math problems that aren't just slight variations of problems they've seen before.

I know plenty of teachers who would describe their students the exact same way. The difference is mostly one of magnitude (of delta in competence), not quality.

Also, I think it's important to note that by "could be solved by high school / talented middle school students" you mean "specifically designed to challenge the top ~1% of them". Because if you say "LLMs only manage to beat 99% of middle schoolers at math", the claim seems a whole lot different.

ChatGPT o1 pro mode solved it on the first try, after 8 minutes and 53 seconds of “thinking”:

https://chatgpt.com/share/67f40cd2-d088-8008-acd5-fe9a9784f3...

The problem is how do you know that its correct ...

A human would probably say "I don't know how to solve the problem". But ChatGPT free version is confidentially wrong ..