Hacker News new | ask | show | jobs
by raindear 799 days ago
But why do transformers perform better than older language models including other neural language models.