Hacker News new | ask | show | jobs
by Alupis 4248 days ago
> An ad network cannot tell the difference between a real click and a fake click based on the HTTP request itself.

No they can't, however they can tell what is a real user and what is not. Real users don't click every single ad presented to them on every single page. Real users don't click ads as soon as a page loads. Real users don't click on all ads at the same or near-the-same time. (If this worked, without getting flagged/blacklisted, site operators would have built bots long ago to click their own ads as there is a lot of money to be made that way)

You absolutely will harm site operators. Ad networks do indeed blacklist sites that get high volume of perceived "fake clicks", whether they are fake or not. You will only harm the sites you like the most and frequent the most.

This is a very naive view of how ad networks operate, and a very naive approach to "solving this problem" (likely built by someone who has not worked with ad networks, nor has operated an ad-driven site, ie. someone with little to no experience in the domain they are trying to solve a perceived problem).

1 comments

You really can't tell the difference between a real person who decides to click one ad per page view and a script that does the same thing. Whatever criteria you use to differentiate between fake and real can be reverse engineered and fed back into the robot to look more human.

Never mind about cyborgs, or script-enhanced humans, which are what users of this add-on will become. You can't even tell if a script was launched by a human or by another script.

It's the Iocaine Powder of ad-serving. The only way to win is to be immune to the effects of playing.

In this case, only ad-serving networks that do not change their visible behavior in response to clicks can win: no site-bans in response to visitor behavior, and no click-through bonuses or payments per impression. And that is the sort of ad network I find most tolerable.

Pay the site operator based upon sound judgement as to what the value of ads on those pages are worth, and toss the site traffic analysis in the trash. You need to have an actual human determining how popular a site is likely to be, because an automated script is never going to be able to differentiate between human and another automated script that knows--or can guess at--the first script's algorithms. Do it correctly, and you won't need to compensate for temporary spikes from HN, or Slashdot, or SomethingAwful, or a Chan, or an SEO firm, or anyone else. The ad campaign pays out according to the agreement, and if the site becomes permanently more popular, the operator and the salesperson renegotiate the rate afterward.

That involves actual ad sales employees with some familiarity with the subject matter. If you purely fight bots versus bots, the programmer with the most knowledge of the other guy's program wins. And in this case, that advantages the attacker more.

You seem to fundamentally not understand how ad networks work.
Do you know the difference between in-band and out-of-band signaling?

The ad networks are using an automated Turing Test based on statistical models to differentiate between "real" and "fake" requests. Until someone commits real dollars to make a purchase, there is no out-of-band verification of the requester's humanness. When you click the ad, your tamper-proof mouse does not take a tiny blood sample to verify that you are a real person, and communicate that via magical ansible to the ad network servers. Until the check clears on a purchase, the only data the ad networks have come through the HTTP requests, as in-band signals.

In-band signals can always be faked. Ask anyone who has ever blown a modified whistle from a cereal box into a phone handset, or modified a Radio Shack tone-dialer to produce the old payphones' "quarter inserted" tone.

So any script writer that either knows or can guess at the algorithms used to automatically sort "fake" from "real" can produce automated behavior that fools the automated sorter. What's more, those models are brittle. If the real behaviors of real humans change, such as by ad-blocking or running other response-modifying scripts, the models become decreasingly accurate classifiers.

A script that blindly clicks all blocked ads on a page is the tip of the iceberg. You can substitute the "click everything" strategy for a "click like a woman pregnant for the first time" strategy, or a "click like a male gamer, aged 17-25" strategy.

If web traffic ever has a significant number of browsers impersonating the browsing behaviors of other types of people with the help of scripts, ad networks can't trust any of their traffic to know "real" from "fake". That is an intractable problem for them.

You have to be able to verify a statistically significant portion of traffic as real humans before your models will work. And that is what Nielsen does with its consumer tracker devices.