|
|
|
|
|
by notahacker
973 days ago
|
|
Sure, curing the disease is more important than curing the symptoms, though the two aren't entirely unlinked. What the systems describe isn't reality though. Mexicans invariably wearing sombreros doesn't reflect Mexican fashion, it reflects whether people have bothered to tag the image with "Mexican" or not. If you can tag reality in ways in which US frat boys' fancy dress preferences are somewhat representative of the label "Mexican" and famous Mexicans in Mexico City usually aren't, then it certainly isn't necessary for job title tags to be highly correlated with ethnicity (Posed stock photos have tended to push back against this for years). And whilst it's true that certain occupations are dominated by white males in the West, they're certainly not the world's "default" people; that's more a reflection of the sort of English speaking internet power users whose content gets hoovered up by the dataset. And that is definitely a bias, even if it's a completely unintentional one. In general it's "reality as seen through the narrow lens of people uploading and tagging photos, often not even with the intention of conveying useful information to an image generation algorithm". That reality includes a lot of biases, some of them more accurate than others and some of them more benign than others. |
|
Of course as you say, the problem (if there is one) is in the dataset and not in the program. But if we consider this should be corrected after the fact, then at that moment we are sure to introduce an actual bias.
On what basis? Who decides what bias should be applied, and the appropriate amount?