Hacker News new | ask | show | jobs
by ajratner 2159 days ago
First: Snorkel Flow absolutely does not generalize to every ML problem :). IMO defining where different systems and approaches do and don't work best is one of the most important and most challenging problems in ML systems research- as noted, we've worked to give detail on this for Snorkel over the years... no perfect answers, but some notes below:

- As you imply, a lot has to do with the available sources of input signal- whether labeling functions, or 'transformation functions' for data augmentation, or other ops we've worked on... the input is obviously key.

- For data modalities like image, video, etc: Often the most successful approach is to (A) rely on some pre-processed features or "primitives" and write labeling functions over these- as my co-founder Paroma in particular has published about over the years- and/or (B) use metadata

- External models are definitely expressable as labeling functions, and we've worked on exactly that problem of modeling (local) biases and correlations!

1 comments

Does it work for semantic segmentation? That's really where I'm struggling to see how this could work.
More advanced structured prediction tasks are still definitely on the cutting edge- mainly IMO down to defining the semantics of the programmatic user input like labeling functions for these kinds of tasks. Some recent work (http://cs.brown.edu/people/sbach/files/safranchik-aaai20.pdf) has extended these semantics for sequence tagging, as an example- so some exciting moves in this direction!