Hacker News new | ask | show | jobs
by Bobby_Tables 5271 days ago
What annoys me to no end is that the counter-argument to the "Google is smart enough to build it" canard is legal, not technological. Child porn is illegal in this country, full stop. If it exists, it's prosecutable. An MP3 of a Justin Bieber song isn't necessarily illegal, and determining whether a particular copy is legal requires information that is not readily available for financial security reasons. So nobody can build a detector, regardless of how smart they are.

Which is exactly the point of SOPA...since we can't stop the flow of copyrighted media on the internet, we can't have the internet.

2 comments

This is just a friendly suggestion to edit that second sentence - it threw me for a bit.
GOOD CALL. This is why I should not post on HN while waiting for my tests to run.
Except that, like someone pointed out, porn is not necessarily illegal and determining whether it is requires knowing the age of those appearing in it. This information is also not readily available. (Of course, just like it's pretty obvious in certain instances, it's pretty obvious that a full movie available for download is illegal, too.) So the cases aren't that different.
The information about how old a person looks exists in the image itself. You can compare the size and proportions of a person's features to known examples. No one is going to mistake a four-year-old for a grandmother. The information about whether or not someone has permission to copy something does not exist in the file: the infringing copy and the legitimate copy are bit-for-bit identical. You have to get the data on who has permission to do what from somewhere else. The Viacom v. YouTube case proved that even expensive lawyers spending hours doing research can't manage to get even 99% accuracy. They had to remove things that Viacom itself had uploaded (or authorized to be uploaded) from that case. Twice.

Now suppose we have a magic system that's better than expensive copyright lawyers that can manage 99% accuracy (which they didn't) and that we operate at internet scale on the roughly 60 billion web pages Google indexes. Then assume that no less than 20% of the internet is pirated material. Does that sound too high? Good. Your numbers will look even worse if it's lower. Now we do a little math:

  0.2 * 0.99 = 19.8% are pirate pages correctly blocked
  0.8 * 0.99 = 79.2% are innocent pages correctly allowed
  0.2 * 0.01 =  0.2% are pirate pages mistakenly allowed
  0.8 * 0.01 =  0.8% are innocent pages mistakenly blocked
So we've just censored four innocent pages for every one containing pirated material and there would be 120,000,000 pirate pages (0.2% of 60 billion) out there that aren't blocked. Feel free to work out the math, but if you want to say that 20% is too high a proportion, you'll have even more false positives, so even more innocent pages are censored for every pirate blocked.

And this assumes that we have something better than ~$300/hr copyright lawyers screening all 60 billion web pages. Any computer program we make won't even be this good.

You are doing a selective comparison.

No one is going to mistake a four-year-old for a grandmother.

True. But plenty of people will mistake a 14, 15, 16, or 17-year old for an 18-year old.

The Viacom v. YouTube case proved that even expensive lawyers spending hours doing research can't manage to get even 99% accuracy.

Again true, but the odds of a randomly selected uploaded full-length movie being illegally uploaded is, I bet, significantly larger than the odds of a randomly selected explicit movie featuring people of illegal age. So the prior coming from what fraction of available content out there is illegal according to the two standards would likely lead to a larger false positive rate for pornography.

Saying that the porn filters don't work well only damages the case for copyright filters further. If the filters don't work well when people want them to, why would they work any better when people start trying to undermine them?