|
|
|
|
|
by jmmcd
467 days ago
|
|
These puzzles probably have more in common with "Zebra puzzles" (eg https://www.zebrapuzzles.com/) than Cluedo (USA Clue) itself. I've been doing some one-off experiments with Zebra puzzles recently. All the reasoning models generate an enormous batch of text, trying out possibilities, backtracking, and sometimes getting confused. From what I can see (not rigorous): Claude 3.7 fails, ChatGPT with reasoning succeeds, DeepSeek with reasoning succeeds. But of course the best way for a model to solve a problem like this is to translate it into a constraint satisfaction problem, and write out Python code to call a CSP solver. |
|
Which means that when you asked it (e.g.) whether A is better than B (as a Decision Support System), it should write a program to decide it instead of "guessing it" from the network.
You are stating that, since the issue is general, LLMs should write programs to produce their own outputs, instead of their standard output.