Hacker News new | ask | show | jobs
by SemanticStrengh 1500 days ago
This is masked token learning, which is used e.g by BERT. This is obscolete and alternatives such as XLNET are much superior but there is too much inertia in the industry and newer large models are still built with the same lossy encoding..