|
|
|
|
|
by omneity
184 days ago
|
|
Thanks, this was helpful! Reading the seminal paper[0] on Universal Transformers also gave some insights: > UTs combine the parallelizability and global receptive field of feed-forward sequence models like the Transformer with the recurrent inductive bias of RNNs. Very interesting, it seems to be an “old” architecture that is only now being leveraged to a promising extent. Curious what made it an active area (with the works of Samsung and Sapient and now this one), perhaps diminishing returns on regular transformers? 0: https://arxiv.org/abs/1807.03819 |
|