|
|
|
|
|
by shawntan
1062 days ago
|
|
Right, I see your point. Since Sudoku is fixed-size, you can always construct a Transformer with the worse-case depth. That makes sense. I was assuming given a trained Transformer, you wouldn't know how many effective "steps of computation" it contained, and so would probably have to resort to CoT. |
|