|
|
|
|
|
by eindiran
2404 days ago
|
|
Certainly English's morphosyntactic simplicity helped out NLP; your phrase "minimum viable model" hits the nail on the head. But increasingly over the last 5-10 years, I think there is a lot of progress on techniques for handling morphological complexity. Some of the unsupervised tokenization methods that first saw use for English (eg Goldsmith's work) now sees play for agglutinative languages: see here for example[0]. So its not clear to me if NLP in a non-Anglo culture would just use the same techniques (arriving at practical achievements a decade later) or if there would be fundamentally different techniques that are totally unobvious to me now. Re your point on language being a "[f]uzzy probabilistic mess" -- language is absolutely NOT a fuzzy probabilistic mess and its a damn shame that NLP based its success on black-box models, because it means no one bothers realizing that language isn't a mess at all. See Jelinek's law of speech recognizer accuracy [1]. Simply because we get results using messy black box models doesn't mean that's how things work under-the-hood. [0] https://www.researchgate.net/publication/221013038_Unsupervi... [1] https://en.wikipedia.org/wiki/Frederick_Jelinek |
|