| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Isinlor 575 days ago

Transformers are very bad at counting due to how their internals work. But if you ask them to use explicit counter the problem disappears:

https://chatgpt.com/share/6775c9a6-8cec-8007-b709-3431e7a2b2...

Basically one feed forward is not Turing complete, but autoregressive (feeding previous output back into itself) are Turing complete.

1 comments

frikskit 575 days ago

This makes it worse IMO. I was starting to think it didn’t have a letter by letter representation of the tokens. It does. In which case the fact it didn’t decide to use it speaks even more towards its unsophistication.

Regardless, I’d love if you would explain a bit more why the transformer internals make this problem so difficult?

Isinlor 575 days ago

When Can Transformers Count to n?

https://arxiv.org/html/2407.15160v2

The Expressive Power of Transformers with Chain of Thought

https://arxiv.org/html/2310.07923v5

Transformer needs to retrieve letters per each token while forced to keep internal representation still aligned in length with the base tokens (each token also has finite embedding, while made out of multiple letters), and then it needs to count the letters within misaligned representation.

Autoregressive mode completely alleviate the problem as it can align its internal representation with the letters and it can just keep explicit sequential count.

BTW - humans also can't count without resorting to sequential process.

frikskit 575 days ago

Thanks!