Based on the redactions in the images in the article, those searches are returning ~75% cp. I guess that could still be a small portion of whats out there, but that's horrifying.
Porn accounts for 1% of images. Child porn accounts for 1 in a million fraction of that. So it’s 10 per 1 billion images.
If you figure the filters exclude 90% of child porn images, that’s 1 per billion which will show up in search results.
I can’t find a good estimate of total number of images, but YouTube shows 5 billion videos a day and gets 1800 minutes of video per minute.
So if we estimate a trillion photos, then we’d expect around a thousand child porn images to make it past the filters, and Bing to be able to return a few pages of 75% child porn when we accidentally stumble on a term in that category.
There's a slightly bizarre situation here where search algorithms are working against themselves, yeah. Presumably anything spiked by PhotoDNA isn't returned at all - which frees up those spots for the next-most-relevant result. And the more effective Bing's indexing is, well, the worse that result will be...
Since PhotoDNA is basically a known-bad tool, it presumably can't win the battle unless turnover is fairly low. Penalizing or hiding sites with many PhotoDNA hits (or perhaps a high percentage of hits) might do better by targeting concentrators, but that would depend on what sort of sites are serving this stuff. I assume they have to be fairly small/scattered to stay operational, which in turn makes it harder to predict what sort of content they have.
(And despite the article, it doesn't seem clear that Google has solved this problem, so much as bypassed it with a whitelist approach to nudity in general.)