|
|
|
|
|
by SemanticStrengh
1500 days ago
|
|
This is masked token learning, which is used e.g by BERT. This is obscolete and alternatives such as XLNET are much superior but there is too much inertia in the industry and newer large models are still built with the same lossy encoding.. |
|