Hacker News new | ask | show | jobs
by aeternum 1500 days ago
Humans also think about words in terms of subcomponents, languages make heavy use of prefixes and suffixes for example.
1 comments

This is not the same.. The masks are randomized and lossy. Although yes there is potential for a transformer specially trained to segment prefixes/affixes/suffixes, it might augment some of its encoding abilities, see e.g spanbert for a related example of opportunity.
What do you mean with "lossy"? What information is being lost? Or do you just mean that there isn't necessarily a unique way to encode a given string?
I mean that information is being lost https://arxiv.org/abs/1906.08237 See xlnet for the rethoric https://www.microsoft.com/en-us/research/publication/mpnet-m... Or mpnet which attempt to combine the best of both worlds information wise but still find that masked modeling is much less useful than autoregressive.