| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SemanticStrengh 1500 days ago
	This is masked token learning, which is used e.g by BERT. This is obscolete and alternatives such as XLNET are much superior but there is too much inertia in the industry and newer large models are still built with the same lossy encoding..