Hacker News new | ask | show | jobs
by thegeomaster 1245 days ago
There's already research that tries to fix this problem with transformers in general, like Transformer-XL [1]. I'm a bit puzzled that I don't see much interest in getting a pre-trained model out that uses this architecture---it seems to give good results.

[1]: https://arxiv.org/abs/1901.02860

1 comments

T5 uses relative positional encoding