|
|
|
|
|
by shawntan
1063 days ago
|
|
I recommend reading the theoretical work on the computational capabilities of Transformers: https://twitter.com/lambdaviking/status/1630581475425828864
References to other work can probably be found in that article. Shameless plug to my own blogpost about this: https://blog.wtf.sg/posts/2023-02-03-the-new-xor-problem/ TL;DR: The theoretical class of problems that Transformers can solve (without Chain-of-Thought style responses) is fairly limited. Generally, universal approximation proofs rely on infinite precision assumptions, which are not practical in reality. Empirical results also show very limited capabilities when tested on certain formal languages. In the Sudoku case, the problem-length is limited, so one could conceptually make a large enough model that could memorise all solutions to all possible combinations of permissible sudoku boards, which could then just access and read out the solutions. |
|
Which is irrelevant because how would a Transformer emit a complete Sudoku solution in a single forward-pass/token in the first place?