Hacker News new | ask | show | jobs
by gharman 1977 days ago
This reminds me of Snorkel (though unclear from the article if they’re using Snorkel’s trick of aggregating many weak heuristics). It can be made to work even in the real world. The rub is that coming up with these programmatic labelers is easier said than done especially for complex data.

It works well if a domain expert can say something without “cheating” and looking at the data like “put a box around round red objects because those are always apples”. But in practice people tend to cheat and look at the data first, and you end up with humans trying to emulate ML, poorly.

1 comments

Definitely easier said than done, but the process at least makes labelling interesting. Sometimes you run into roadblocks where you can't get past just having a human doing some element of the labelling, but once you do have a few algorithmic strategies that work reasonably well on a representative sample of your data, you can usually scale them pretty effectively to the rest of your data