| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kelseyfrog 1466 days ago
	A massive and continually updated discriminative dataset used to train a language classification model. Build and update continually via mturk or similar and retrain daily. Audit raters (itr) periodically to ensure they're finding signal or at least not disagreeing.