|
|
|
|
|
by shawntan
638 days ago
|
|
Theoretical results exist that try to quantify the number of CoT tokens needed to reach different levels of computational expressibility:
https://arxiv.org/pdf/2310.07923 TL;DR: Getting to Turing completeness can require polynomial CoT tokens, wrt the input problem size.
For a field that constantly harps on parallelism and compute efficiency, this requirement seems prohibitive. We really need to get away from constant depth architectures. |
|
So, as stated, this is impossible since it violates the Time Hierarchy Theorem.
The actual result of the paper is that any poly-time computable function can be computed with poly-many tokens. Which is... not a particularly impressive bound? Any non-trivial fixed neural network can, for instance, compute the NAND of two inputs. And any polynomial computable function can be computed with a polynomial number of NAND gates.