|
|
|
|
|
by sriku
2167 days ago
|
|
Labels are knowledge about data. If you already know some rules that work reasonably well based on your domain experience, then Snorkel lets you capture those as "labeling functions" that may not cover the whole ground or can be "noisy". Snorkel can then build a model to label your data accounting for the "noise". Combining that with some "gold" labels (done by humans), you can use the generated labels on a large data set to build a higher quality model that generalizes better. This is similar to how you can take several low quality models and by virtue of them having expertise over different parts of the data, build an "ensemble" model that performs better than any of them. Imho, Snorkel kinds of tools ("weak supervision") are game changers for ML .. though the biggies get all the press. So I'm excited to see this end to end direction taken by the team. |
|