|
|
|
|
|
by slashcom
2840 days ago
|
|
An easier way to understand it is in the context morphology: word prefixes and suffixes mean things, and words have common roots. For example, polymorphism could be decomposed into poly-morph-ism. Antidisestablishmentarianism, which is unlikely to appear much in the corpus, becomes anti-dis-establish-ment-arian-ism. Now the system can learn how to reuse "anti-" or "establish" from other examples more easily than trying to learn the full word's meaning from the one or two examples it might see in the corpus. BPE is a clever way to induce these sort of decompositions automatically without any linguistic annotation, making them useful in multilingual settings. Other languages are much more morphologically rich than English, and there it really benefits. |
|