| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by omneity 184 days ago

Thanks, this was helpful! Reading the seminal paper[0] on Universal Transformers also gave some insights:

> UTs combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs.

Very interesting, it seems to be an “old” architecture that is only now being leveraged to a promising extent. Curious what made it an active area (with the works of Samsung and Sapient and now this one), perhaps diminishing returns on regular transformers?

0: https://arxiv.org/abs/1807.03819