Hacker News new | ask | show | jobs
by lolinder 1092 days ago
The original paper[0] that laid the foundation for modern LLMs was demonstrated on machine translation tasks. It's one of the primary use cases these architectures were designed for. What other types of models do you have in mind that outperform them?

[0] "Attention Is All You Need" https://arxiv.org/pdf/1706.03762.pdf