Hacker News new | ask | show | jobs
by kingcai 1667 days ago
I've used Snorkel quite a bit at work, usually combined with transformers models.

It has worked quite well for us. The snorkel public package is a bit out of date now, as I think they're building a SaaS solution and focusing more on that. But aside from that it's quite easy to use. Other downside is that lot's of cool ideas are present in the papers but not fully implemented (not complaining though!). Also thinking of a diverse set of heuristics can be hard.

We use snorkel a lot for bootstrapping text classifiers. Our classification models don't require much domain expertise, as it's pretty easy to tell if a text sample is classified correctly, so the main advantage is just avoiding labeling costs and quicker prototyping. We find that we can usually use embedding similarity as a good heuristic. I wrote up a little bit about this approach here if you're curious: https://cultivate.com/why-cultivate-uses-embeddings-for-rapi...

Happy to answer any additional questions you have too :)

1 comments

How do you experiment with the different labelling functions? Notebook type setup?

Thanks for the blog post!