|
|
|
|
|
by shawntan
249 days ago
|
|
Right. There should really be a vanilla Transformer baseline. With recurrence: The idea has been around: https://arxiv.org/abs/1807.03819 There are reasons why it hasn't really been picked up at scale, and the method tends to do well on synthetic tasks. |
|