Hacker News new | ask | show | jobs
by avs733 2263 days ago
>This. Universities and online challenges provide clean labeled data, and score on model performance.

First homework assignment in the stats class I teach is to clean data that the class generated with directions they all perceived as clear. It's near about the most hated assignment I have ever given. Amazing how many ways there are to encode gender of a experimental participant.

Male, M, m, male, Man, ...

1 comments

gender.lower().startswith('m')... done! :)
Except a real dataset will have its fair share of "nale", "amle", etc.
I would pay student who figured that out $20