Hacker News new | ask | show | jobs
by edshiro 2167 days ago
I presume this does not apply to computer vision datasets? Frankly I am still confused at what exactly Snorkel does.
1 comments

you have a dataset of images and you write code (labeling functions LF) to label the images. Snorkel handles the pipeline but more importantly corrects the conflicts/correlations between the LFs. The output is a supervised dataset w/ mutually exclusive labels a la softmax classification.

the labels are noisy, but you have a quantity that you could not get by humans, AND at a faster/cheaper rate. they provide analysis arguing that, for discriminative models, quantity CAN outweigh quality.

to your point it's not typically used w/ the image-only modality. It's mostly used where there is some meta-data attached.