Hacker News new | ask | show | jobs
by hadsed 2254 days ago
One of the most important and underserved aspects of actually doing machine learning is data collection and error analysis (and in software they are essentially the same thing).

Prodigy by Explosion AI (creators of spacy) is very good, great UX focused on making you extremely efficient. It's a paid product but I'm happy to help fund such a talented and impactful team.

That said we don't use any tool out there nearly as much as we could. One fundamental reason is that they don't cover all of our use cases. My team works with legal contracts, and oftentimes the flow has to be: scan through the entire document to find the region you're looking for and drag your cursor over it to highlight. I haven't seen any annotation tool that works for that, so we built our own.

In a similar vein, doing error analysis for those predicted highlights on large documents is also painful. Scrolling here is a chore. If anyone has seen LiquidText in action then you know the solution here.

There's so much left to do in this world. The exciting part is that the better the models start to work, the more interesting UX challenges you have to push efficiency even further. All ML projects largely have to look at the costs of building and analyzing their datasets, and making that cheaper with models in the loop and exceptional UX is super critical and super fun to think about.