Hacker News new | ask | show | jobs
by sp332 890 days ago
Databricks had their employees write up 15,000 of them. https://www.databricks.com/blog/2023/04/12/dolly-first-open-...
3 comments

Favorite part of this piece:

> We were initially skeptical whether we would get to 10,000 results. But with nightly leaderboard gamification, we managed to break 15,000 results within a week. Out of fear of eating into our productivity, we closed the contest.

I've hosted a few of these corporate data labeling events. If sufficiently gamified / there's a good enough UX, they can be surprisingly engaging. It helps a lot if you have a large employee base though. Distributing results over 5000 employees is exponentially easier than even 50 - in practicality, even larger than the orders of magnitude.

I’ve worked at plenty of places where we did a ton of labeling by hand.

People concerned with data quality from LLMs should really see the inconsistencies we came up with!

Anybody have this downloaded and can paste a few examples?