|
|
|
|
|
by kevingadd
1694 days ago
|
|
If you're publishing a dataset in the terabytes it does actually make sense to at least do a pass over it and make sure the data you're using isn't skewed in any undesirable way that would cause problems down the road. For example, if you're releasing 5tb of face photos for training facial recognition nets, it would certainly be a problem if all the faces are white women or asian men - the result would probably be over-fit and not perform as well for people in other categories. It would be correct to call that a diversity/inclusion issue. Privacy and accessibility reviews serve similar purposes there, you're reducing risk by checking for these various problems and ideally they also spot ways to improve the quality of your outcomes. |
|
To clarify, I think it's good that this is a practice.