Hacker News new | ask | show | jobs
by logicchains 741 days ago
They can solve it if you keep adding layers to the transformer, it's just not efficient; you'd need exponentially more layers than a similarly sized RNN.