Hacker News new | ask | show | jobs
by eindiran 2404 days ago
Certainly English's morphosyntactic simplicity helped out NLP; your phrase "minimum viable model" hits the nail on the head. But increasingly over the last 5-10 years, I think there is a lot of progress on techniques for handling morphological complexity. Some of the unsupervised tokenization methods that first saw use for English (eg Goldsmith's work) now sees play for agglutinative languages: see here for example[0]. So its not clear to me if NLP in a non-Anglo culture would just use the same techniques (arriving at practical achievements a decade later) or if there would be fundamentally different techniques that are totally unobvious to me now.

Re your point on language being a "[f]uzzy probabilistic mess" -- language is absolutely NOT a fuzzy probabilistic mess and its a damn shame that NLP based its success on black-box models, because it means no one bothers realizing that language isn't a mess at all. See Jelinek's law of speech recognizer accuracy [1]. Simply because we get results using messy black box models doesn't mean that's how things work under-the-hood.

[0] https://www.researchgate.net/publication/221013038_Unsupervi...

[1] https://en.wikipedia.org/wiki/Frederick_Jelinek

1 comments

The first 40ish years of computing are dominated by machines with a paucity of online storage. It would be more than just a 10 year delay.