|
|
|
|
|
by gwern
1063 days ago
|
|
> The theoretical class of problems that Transformers can solve (without Chain-of-Thought style responses) is fairly limited. Which is irrelevant because how would a Transformer emit a complete Sudoku solution in a single forward-pass/token in the first place? |
|
I think if we're getting specific to this particular Sudoku example, the CoT would probably involve a trace of the entire filling-in and backtracking steps that a solver would do.
My guess is that the straightforward output of the exact solution, even though it requires several tokens, wouldn't be enough to do the constraint resolution in Sudoku, you'd need the intermediate CoT "thinking out loud"