Y
Hacker News
new
|
ask
|
show
|
jobs
by
logicchains
741 days ago
They can solve it if you keep adding layers to the transformer, it's just not efficient; you'd need exponentially more layers than a similarly sized RNN.