|
|
|
|
|
by aghilmort
185 days ago
|
|
there’s decent work on computational reasoning power of transformers, SSMs, etc. some approximate snippets that come to mind are that decoder-only transformers recognize AC^0 and think in TC^0, that encoder-decoders are strictly more powerful than decoder-only, etc. Person with last name Miller iric if poke around on arXiv, a few others, been a while since was current top of mind so ymmv on exact correctness of above snippets |
|