| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by onurcel 1439 days ago
	This is one of the examples we keep in mind and that's also why we can't 100% trust public dataset labels. This motivated us to train a Language IDentification system for all the languages we wanted to handle in order to build the monolingual dataset. More details in the paper ;) Or here, if you have questions