|
|
|
|
|
by mci
2304 days ago
|
|
> There are also "agglutinative languages", like Polish, which can add many morphemes together to create very long "words" which include a lot of separate pieces of information. [1] Polish does not work this way. Source: I am Polish. Perhaps jph00 meant Turkish. Issue filed. [1] https://github.com/fastai/fastbook/blob/master/10_nlp.ipynb |
|
But for the book I mentioned Polish due to this paper: https://arxiv.org/abs/1810.10222 . But as you say, now the word "agglutinative" isn't technically correct. I'm actually not sure what the right word is to describe languages that have lots of big compounds with no spaces. (Which is the key issue here, as to why we need subword tokenization techniques).