Hacker News new | ask | show | jobs
by meroes 73 days ago
Grammar seems like you’re talking about LLMs specifically. Well, isn’t Sudoku just math? LLMs suck at Sudoku last I checked. When told not to code a solver, its very first deduction was wrong.
1 comments

Generally when people talk about using LLMs to do mathematics research they’re not talking about the LLM alone, but the LLM + a harness for it to write and execute theorem provers such as Lean or Coq to validate their results.
I guess I just don’t have the experience or optimism that a harness around an LLM, which can’t make the first, bare deduction on its own, is a good use of compute.

I got out of RLHF, including games and puzzles, before agents took off and maybe I have outdated info. But we estimated RLHF’ing a single hard full sized sudoku was ~25 hours worth of work.