|
|
|
|
|
by ajratner
2159 days ago
|
|
First: Snorkel Flow absolutely does not generalize to every ML problem :). IMO defining where different systems and approaches do and don't work best is one of the most important and most challenging problems in ML systems research- as noted, we've worked to give detail on this for Snorkel over the years... no perfect answers, but some notes below: - As you imply, a lot has to do with the available sources of input signal- whether labeling functions, or 'transformation functions' for data augmentation, or other ops we've worked on... the input is obviously key. - For data modalities like image, video, etc: Often the most successful approach is to (A) rely on some pre-processed features or "primitives" and write labeling functions over these- as my co-founder Paroma in particular has published about over the years- and/or (B) use metadata - External models are definitely expressable as labeling functions, and we've worked on exactly that problem of modeling (local) biases and correlations! |
|