| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by eggie5 2166 days ago

you have a dataset of images and you write code (labeling functions LF) to label the images. Snorkel handles the pipeline but more importantly corrects the conflicts/correlations between the LFs. The output is a supervised dataset w/ mutually exclusive labels a la softmax classification.

the labels are noisy, but you have a quantity that you could not get by humans, AND at a faster/cheaper rate. they provide analysis arguing that, for discriminative models, quantity CAN outweigh quality.

to your point it's not typically used w/ the image-only modality. It's mostly used where there is some meta-data attached.