| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lolinder 1092 days ago
	The original paper[0] that laid the foundation for modern LLMs was demonstrated on machine translation tasks. It's one of the primary use cases these architectures were designed for. What other types of models do you have in mind that outperform them? [0] "Attention Is All You Need" https://arxiv.org/pdf/1706.03762.pdf