Hacker News new | ask | show | jobs
by makeitdouble 1091 days ago
At the same time, Google is the only entity that could provide the numbers you want to see.

Who else could get actual numbers of scams they run vs legit ads ? There's no other entity that sees every single of the ads they serve, so we'd need a trustworthy insider leak to meet your standard.

1 comments

There is a concept from polling called a "random sample" which would apply here. If you take 10,000 clicks and sort them into legitimate and illegitimate, as long as you were able to rule out any way that your sample would be significantly more or less fraudulent than the whole pool, you would understand (roughly) the ratio in the greater pool.

It's like taking a sample of a swimming pool, to check the chlorine levels.

I’m unfamiliar with how Google Video Partners works, but how could anyone go about randomly sampling the sites their ads are running on? According to the article Google won’t tell you where your ads are running. And just finding shady video websites is also not an unbiased method.

Also just to be clear, clicks are not likely to be randomly distributed between legitimate and spam sites. If advertisers are paying for plays, they need to be able to see that information as well.

If I was spending money on video ads now I would be somewhere between cancelling my ad spend and calling a lawyer.

This would be a good measure for your own site, and you could get back to Google to say your results are garbage.

But you would have nothing to retort if they tell you you're an outlier, or your specific niche has bad actors that skew the results and they'll look into banning one or two of them (and come back to us again when they get replaced, we'll again clear a few token users)

>There is a concept from polling called a "random sample"... It's like taking a sample of a swimming pool, to check the chlorine levels.

And maybe the edge of the pool closest to the person taking the sample is the deep end where mostly adults swim, but if they walked a bit further and took the sample from the shallow side with the babies and toddlers they'd find out the water is actually 50% piss with more than trace amounts of feces mixed in as well.

Strawman. Parent was talking about [(Cl^(-1))]

If you care about excrement content, that should be another study.

Bottom line: Representative sample https://en.m.wikipedia.org/wiki/Sampling_(statistics)

I think the point was that you need to ensure a representative sample and that requires an idea of the scale of the fraud taking place to be able to measure its impacts. I have no idea if that's possible without insider data or not, but I suspect that was the point of the argument, and not which substance was being tested for in the imaginary body of water being used as an example.
Yes, I am-- very-- well aware of what a representative sample means, and made my comment specifically with that knowledge in mind when I responded.