Hacker News new | ask | show | jobs
by kelseyfrog 1466 days ago
A massive and continually updated discriminative dataset used to train a language classification model. Build and update continually via mturk or similar and retrain daily. Audit raters (itr) periodically to ensure they're finding signal or at least not disagreeing.