Hacker News new | ask | show | jobs
by SemanticStrengh 1499 days ago
I mean that information is being lost https://arxiv.org/abs/1906.08237 See xlnet for the rethoric https://www.microsoft.com/en-us/research/publication/mpnet-m... Or mpnet which attempt to combine the best of both worlds information wise but still find that masked modeling is much less useful than autoregressive.