| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ajratner 2206 days ago

First: Snorkel Flow absolutely does not generalize to every ML problem :). IMO defining where different systems and approaches do and don't work best is one of the most important and most challenging problems in ML systems research- as noted, we've worked to give detail on this for Snorkel over the years... no perfect answers, but some notes below:

- As you imply, a lot has to do with the available sources of input signal- whether labeling functions, or 'transformation functions' for data augmentation, or other ops we've worked on... the input is obviously key.

- For data modalities like image, video, etc: Often the most successful approach is to (A) rely on some pre-processed features or "primitives" and write labeling functions over these- as my co-founder Paroma in particular has published about over the years- and/or (B) use metadata

- External models are definitely expressable as labeling functions, and we've worked on exactly that problem of modeling (local) biases and correlations!

1 comments

woeirua 2206 days ago

Does it work for semantic segmentation? That's really where I'm struggling to see how this could work.

link

ajratner 2205 days ago

More advanced structured prediction tasks are still definitely on the cutting edge- mainly IMO down to defining the semantics of the programmatic user input like labeling functions for these kinds of tasks. Some recent work (http://cs.brown.edu/people/sbach/files/safranchik-aaai20.pdf) has extended these semantics for sequence tagging, as an example- so some exciting moves in this direction!

link