| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rsmith49 2946 days ago
	There is definitely a feedback loop at play in the NLP community. Since most NLP research is based on English, most applications that build off that research are also tailored to English. Our approach to classifying feedback in other languages is to simply use Google Translate's API and then proceed to classify the English translation. However, you are right in that many aspects of the base feedback are lost in this approach, and our accuracy is lower than if we had a language specific model for each language. That being said, there is a promising new research paper from fast.ai (https://arxiv.org/abs/1801.06146) that speaks to using Wikipedia data to create a language model for a specific language, which can then be trained specifically on the task you are trying to solve. If this is as effective as the authors state, then NLP could see huge improvements non-English languages where there is already a large set of Wikipedia data.