Hacker News new | ask | show | jobs
by astrange 507 days ago
I actually coincidentally tried this yesterday on variants of the "surgeon can't operate on boy" puzzle. It didn't help, LLMs still can't reliably solve it.

(All current commercial LLMs are badly overfit on this puzzle, so if you try changing parts of it they'll get stuck and try to give the original answer in ways that don't make sense.)

1 comments

What do you mean by you tried it?
Generated some Prolog programs and looked at them and they were wrong.

Specifically, it usually decides it knows what the answer is (and gets it wrong), then optimizes out the part of the program that does anything.