| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kandalf 3642 days ago
	To me this seems like the right approach - either you start with some massive amount of data that's not quite adapted to the problem (think Google), you start with full automation and have to basically write the decision tree yourself, or you generate the appropriate labeled training data like this. However, it seems like there are some scale issues if you start upmarket like Clara Labs has been. I wonder if there's benefit in having a cheaper more mass-market version as well that can be used to generate larger amounts of data and test algorithms better?

1 comments

jasonlaska 3642 days ago

It's certainly possible. One advantage of our setup is that rather than getting ok -to- noisy labels from customers, our CRAs understand the end-goal of the application and generate pretty great data. We are also able to incentivize them to produce fewer errors.

link