I worked for another computer vision company, Clarifai that had the same issue. One of the employees noticed it and we retrained the model before it became public.
This is what amazes me. Given this exact thing has happened in the past and resulted in public humiliation of the companies involved, how did they not notice this? Why didn’t they check for it?
What’s being reported is that there is a single video which is mislabeled. For all we know, they did test for this, and believed there was no issue.
AI models are deterministic in a purely technical sense, but practically speaking, they are non-deterministic black boxes. It’s not as if you can write a unit test which generates all possible videos of black people and makes sure it never outputs “gorilla”.