|
|
|
|
|
by twsttest
2169 days ago
|
|
I'm not disagreeing with your basic idea, but it seems you're nitpicking and talking past Yann's point. A model's only link to the real world is the training data, so saying it's sufficient to "worry about the training data" captures all the concerns we may have about bias, because from the model's POV there is no other relevant interface with the real world. Saying "we need to do more" is devoid of meaning when by addressing the training data we are truly doing all we can as model builders and trainers. |
|
A huge problem in the field is that we must use the previous benchmarks. This is because how do you know if the needle moves or not if you just change your data constantly?
So. In order to tackle this problem, someone with more resources than me needs to create training sets that are less biased. THEN, new academic papers need to benchmarked against the old biased sets, and also the new "less biased" (I don't think it's possible to ever get 0% bias, the world just isn't that clean) sets. And progress needs to be eventually transitioned to be measured on the new less biased sets.
The upsampling algorithm used pictures of celebrities. And the researchers put a blurb in their paper that was basically a "We know this is biased but everyone uses it so we must also". I feel like this is less useful science than an algorithm trained on more of a mix of actual real-world humans.
I admit it's quite challenging and probably impossible to do in some areas. I mean, how do you make a field whose end algorithmic goal is generalization, not use real world data to generalize people? But I think the issue can be worked on, and the need to use celebrity photos to train a set is a good place to start.