Hacker News new | ask | show | jobs
by invalidroot 411 days ago
Nice writeup! This is the second post I've seen in the genre of "I've had a secret, personal benchmark for LLMs where the 'solution' requires questioning the premises, and o4-mini-high beats it." The first post I saw was about a chessboard and the prompt "mate in one:" https://x.com/KelseyTuoc/status/1912945346126417940

(Edited to remove direct spoiler for the MU-puzzle, in case people want to try it.)