| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by thegeomaster 1245 days ago
	There's already research that tries to fix this problem with transformers in general, like Transformer-XL [1]. I'm a bit puzzled that I don't see much interest in getting a pre-trained model out that uses this architecture---it seems to give good results. [1]: https://arxiv.org/abs/1901.02860

1 comments

T5 uses relative positional encoding