Hacker News new | ask | show | jobs
by britneybitch 1251 days ago
The article makes a big assumption that ads cannot exist without privacy violations, but that's simply not true. Print ads have existed for longer than the internet, and they're chosen based on the content, not the reader. If I'm reading an article about guitars, show me an ad for a guitar or pedal or amp. Easy. That would already be better than 99% of today's "targeted" ads.
9 comments

There's a decent amount that can still be done cleanly. You can count how many impressions were served vs how many clicked through. You can use campaign slugs to measure which sites sent more, or how effective different ads were. You can serve different ads to different geographic regions, and as you said - different pages ..

Almost everything we currently decide when you visit a page, the only real concession is that it's scoped to the current page, rather than every page that we know you've visited.

I'm honestly not convinced our current idea of targeted ads actually works. Continually showing me links to a product I bought someone for Christmas a month ago isn't remotely clever, despite being precisely targeted. "People shopping for guitars on Saturdays are more likely to convert to sales, than people shopping while they should be working" does not require storing PII, and is (hypothetically) more useful than "If you bought a guitar last week, surely you're more likely to buy a guitar this week".

My friend has been at Facebook forever - apparently new hires often show up and say "I just bought a vacuum cleaner, why do you think I need a new one?" and the strategy of not showing you that ad gets tried over and over, and never actually improves clicks. Friend also doesn't have an explanation, apparently there are just people out there, who look just like you and me, but buy their vacuum cleaners two at a time a week apart. Mystery.
That's what happens when you are so bad at predicting a thing that any noise overwhelms your signal.

People sending stuff they brought back and buying something else exist, and also people buying for their friends. But if you had an actually good prediction of their behavior, you wouldn't need to proxy it by "brought a vacuum recently".

Yet, companies insist that failure is a feature, and that their local maximum is the best possible world. And will keep harming advertisers, viewers and society to keep their position at that peak.

If not showing repeat vacuums does not improve clicks, I'd be very curious if it reduces clicks either.

It seems like what you posit is only a mystery if we take it on faith that targeting works. Once that faith is lost, you can also approach it from the angle that if showing them something we think they want is just as effective as something we know they don't want, you're actually proving that "what we think they want" isn't working either.

I mean, given "we know you've bought a vacuum cleaner" and "we think you'd like a kettle" - if ads for either are equally effective, either there's a lot more people collecting vacuums than either of us would have expected - or we're entirely wrong about the kettle. And it seems to me that our blind faith in targeted advertising leads us to wonder why people want so many vacuums.

I don't think it's a mystery. Ad targeting usually doesn't know which people have just bought a vacuum, just that someone has been looking at vacuums recently. People who have been looking at vacuums are far more likely than the typical person to buy one, so it's not surprising that it would be worth showing them an ad reminding them that they can buy one from you. Especially with higher margin products like mattresses or cars.

But how do we explain how there are also advertisers like Amazon who do know that you just bought a vacuum from them and still show you more ads for it? Since Amazon is in a position to run principled A/B tests on whether showing these ads leads to sales that otherwise wouldn't have happened, and they are the kind of organization I'd expect to get this right, this part I am willing to accept without external evidence. It's probably that the likelihood of additional purchases of the same item, for yourself or others, is high enough, combined with that the cost of advertising to you is low enough.

I agree. There is a lot of folklore amongst advertisers. Also, the companies serving the ads, do not care whether the advertisement achieves it's purpose, they only care about the goal they get paid for. Next, there is the rat race between advertisers, outbidding each other on certain users. The house always wins.

Removing PII will hopefully hurt the greedy ad serving companies most. Though I am scared they will find workarounds.

My website https://officesnapshots.com does this.

Our niche is office design content and we sell advertising primarily to office furniture manufacturers. The ads are hosted by us and are sold directly.

This approach definitely works, and I agree you can do it without any tracking. All that's needed is that the furniture manufacturers can tell how much of their purchases are coming via your site.

But this is also not a model that can support a very large fraction of the existing web. Most sites aren't built from the ground up to have this kind of highly commercial tie-in.

Monetizing a general news website would definitely be quite hard with this model.

*edit: that said there is a local news website in my city that has local ads for local services and businesses.

Our specific case is weird in that contract furniture is mostly purchased through dealers so tracking sales isn’t as straightforward.

We partially do something similar. I wrote an adserver that mainly serves contextual ads. We also serve consent-requiring ads if given, but with an adblocker (a bespoke adserver for a smallish site rarely gets blocked) or no consent, we serve HTML ads based on context.
A large factor is whether or not there are any contextual ads at all for the content being viewed. Sure, random sites with amazon product lists might benefit from letting Google Ads scrape their page then show ads related to the products, but random forums and game sites might not have any contextual ads that would sell.

For example, dotapicker.com (which allows you to see counter-pick viability in Dota 2) runs ads but they're either ads of web-based video games that shows a lot of skin and have very vague wording to skirt Google Ads guidelines (about a quarter of the time), or random product and services related to my overall browsing history, eg. Amazon/Etsy/Ebay/etc or B2B services like Monday.com.

Contextual ads definitely work where they can, which is why google.com ads are all contextual to your search, but it expands available ad real-estate to run targeted ads in places that otherwise would get no clicks.

I wonder if you read the part about fraud detection?
That argument is somewhat flimsy though. It is only a problem if you want to pay per view. If you just pay for "show it for one week" as in print then there is no issue.
Yes. But even if you pay for a month placement on a given site, the ad buyer is still going to want an estimate of how many legitimate viewers see/saw your ad.

It probably works better on things like podcasts but the bottom line is that you can't really trust me to tell you how many viewers my site had without some fraud detection mechanisms in place. (And even then fraud is apparently pretty rampant.)

>But even if you pay for a month placement on a given site, the ad buyer is still going to want an estimate of how many legitimate viewers see/saw your ad.

yes, you might WANT that. but you do not NEED that. Remember that ad's isn't only effective when clicking on it, but also by spreading visibility of your offer/product.

you could easily use sites visitor count as a estimate for monthly pricing. And frankly without all tracking there would be no incentives to fake click on ads at all, so you could get your own tracking on your own side.

If I can't tell with some degree of assurance whether you have 1K viewers per month or 1M viewers per month (never mind uniques), I'm probably not going to pay for ad on your site.
I may be missing something, but why? Wouldn't the buyer just look at how many people arrived at their website from the ad?
Bots can click through like any other visitors. They won't buy stuff, though, so if you're placing ads for people to directly take an expensive action ("performance advertising") you're mostly ok. Brand advertising, though, is very dependent on fraud detection because it doesn't have this clear connection.

There's more about this near the end of the post.

Do you think it’s possible to measure how many people are coming from an ad on a specific site and buying something, without also resorting to illegal tracking?
How do you know those clicks are people? (You can measure just conversions but that's a much higher bar for measuring an ad and a direct conversion may not even be your objective.)
Advertisers would want to know something akin to circulation numbers before paying for placement though. And they would want to be reasonably confident those numbers aren't pumped by fraud.

Maybe the web needs something like Nielsen ratings, where Verified real users voluntarily submit their browsing to help advertisers determine watch time.

I talk about this at the end of the post as well. I think a ratings panel approach doesn't apply well to the web because the web is so fragmented. (In a good way! I like that there are lots of independent sites!)

Unless you had something like 10% of real users or a super representative 1% I think you'd have a big problem with getting realistic numbers outside huge sites.

But in print it is knowable how many copies are going out, isn't it?
Even that can be fiddled.

USAToday used to do it via deals with hotels to put one outside your room. Boom, extra copies!

Yeah although ad buyers of any sophistication knew going in that a lot of certain newspapers were hotel copies (though arguably a lot of people read those), trade rags vs. consumer subscriptions, general vs. niche, etc. and took that into account.

They still didn't have a great idea of how effective they were a lot of the time of course.

Printing a newspaper at least costs a bit more money and there's a decent chance someone will read the copy of the paper in their hotel room. Fake Web traffic could easily dwarf real traffic.
Print can charge for ad placement, not impressions. But that's only possible because the number of subscribers, size and number of print runs, circulation are well known and hard to fake stats.

The web cannot do that without tracking...

Depends what you call tracking. Just like a newspaper company knows how many copies they distribute a website would know how many copies they've sent over the line.

You might not trust them but I don't see why you'd think a newspaper would have a harder time claiming they've made more copies than they have.

Sending out a newspaper is much more expensive than sending out a pageview. If a million bots visit my site everyday I can point at server logs showing very high numbers of pageviews, but you wouldn't want to use that to pay me!

Now, you could say don't do business with people who are trying to defraud you, but one of the more impressive things about the ad ecosystem is that it works without advertisers and publishers trusting each other. I can sell the space on my site and advertisers don't need to figure out how much to trust me in particular.

It's easy to have ad fraud that's plausibly deniable and maybe even not on purpose. Let's say you're a publisher and you want more people to come to your site. You look around and you find someone who says they run a newsletter and would be willing to include links to your stories for a small fee. When you multiply out the cost per visitor this looks like a pretty good deal; you say yes. This traffic turns out to be entirely bots, but you can't tell because we got rid of ad fraud detection.

> but one of the more impressive things about the ad ecosystem is that it works without advertisers and publishers trusting each other. I can sell the space on my site and advertisers don't need to figure out how much to trust me in particular.

For a person who worked in ads you are surprisingly oblivious to how both Facebook and Google defrauded advertisers and publishers.

- Facebook Lied About Video Metrics and It Killed Profitable Businesses https://www.ccn.com/facebook-lied-about-video-metrics/

- Google Hit With $268 Million Fine Over Unfair Ad Practices https://gizmodo.com/google-hit-with-268-million-fine-over-un...

Ad industry should burn in the fires of hell.

There was a second flaw in the premise between print publishers and websites. Some print publishers inflate circulation numbers when attempting to sell ad space in an attempt to make the ad space appear more valuable.

Similar double-selling of television ads take place as well. A network ad may run in the network feed and will appear in the network affidavit logs as having run. Local affiliates supercede network feeds all the time and in some of those cases they do so during a network ad pod and run station-sold ads. Consumer sees only one of the TV ads but both get reported as run.

Fraud in advertising isn't something new or isolated to the online medium.

Ignore the headline which is completely unsupported by anything in the story but circulation numbers are audited, e.g. by an organization called BPA. https://www.photonics.com/Articles/The_Magazine_Industrys_Di...

Doesn't mean there isn't fraud but someone can't really make up circulation numbers out of whole cloth.

Sure, but if anything that shows that newspaper companies can't be trusted to report accurate numbers if there is no oversight whatsoever.

All I'm saying is that the same logic holds for websites and that tracking people is a needlessly paranoid and harmful solution to the problem when it can be solved with trust, contracts and auditing.

>it can be solved with trust, contracts and auditing

I actually agree with you. And large businesses in particular are both audited and do internal audits all the time. It's not that they and their employees are all untrustworthy but audits can both catch mistakes and send a signal that people are watching.

There are probably other mechanisms to do fraud detection. But, as your comment suggests, they may be more heavyweight and therefore might exclude most smaller sites.

Big sites (newspapers) could do it the same way - maybe independent reach surveys could be the data that they use to back up their web ad pricing.
I think that's one of the points though. Big sites can back up their digital reach claims such as by using third parties. But if vetting yourself for advertisers was to become a requirement, advertisers just might not bother with small sites. Though admittedly Google makes money off small sites--but I bet most of those small sites don't make enough to make note of.
Google's vetting of small sites is pretty minimal. Instead they use fraud detection: vetting the traffic itself.
Right. Which is the point both of us are making I think. If you can't do some minimum floor of fraud detection more or less transparently and cheaply, you fall back on more heavyweight mechanisms and probably screen out any site that isn't "big."
Yeah it's easy and is exactly how ads were first implemented. Go do some research on the history of Google Adsense

Well big surprise: They don't perform as well. Don't you think there is a reason the ad industry went in the direction it did?

Now that they’re not allowed to go that direction, though, the old way looks more appealing.
Strange way of saying they have no other option
“No ads” is another option. See the title of the article up top of this discussion.
Surely no CEO would actually struggle to choose between legal and poorly-performing vs illegal and better-performing?
Who says personalised ads are better performing?

The companies who make the real money out of ads are ad tech vendors, with publishers often picking up the scraps

It’s not suprising that people who’ve worked in personalised ads often promote them as the only way to fund the internet

> Who says personalised ads are better performing?

I deliberately made my comment more generic, this isn't just about ads.

Any senior manager worth his or her salt knows that staying on the right side of the law is significantly more important than supporting any dubious "product improvement" which is on the wrong side of the law.

> Who says personalised ads are better performing?

https://twitter.com/garjoh_canuck/status/1318989360407236609 summarizes the studies on this, and this comment [1] gives additional context.

[1] https://www.lesswrong.com/posts/dFgfQTo4DRZG5t8Ap/can-ads-be...

Garrett's thread and the studies in it are often rolled out as evidence that personalised ads perform better and so losing the ability to harvest data that supports personalised ads would be bad for the web

Looking how the data was collected in the studies and where it comes from then Google, FB and other adtech vendors feature strongly (and they can't be considered neutral sources)

In the ad market there are three participants - advertiser, publishers, ad networks.

The last group keep telling us we need them for a health internet, while being completely opaque but making billions creaming off their cut of advertising revenue (there don't seem to be any reliable figures on how much they actually take but an old Guardian study found that sometimes they were getting less than 10% of the revenue the advertiser actually spent - Guardian have brought ad sales in house since)

Alternatives such as category based advertising are often not discussed other than to say 'if personal ads go then the spend is likely to be re-directed towards category based ads'

Even Google while claiming personalised ads are better seems to rely on category based ads for the revenue from search ads (see the CMA report)

IME working alongside various publishers it's often that case that directly sold category based ads are way more profitable for the publishers

Given thread like https://twitter.com/nandoodles/status/1582434737813348352 (and others) I'm deeply skeptical of claims ad tech companies make

I also wonder if these really targeted ads are cost effective. If ads were run as they used to be positioned in Newspapers and Magazines, a simple area of the screen that was set aside for an advertiser, with no targeting, no tracking, and performance measured simply by "impressions" and "clickthroughs" would they work at all? It would be much better for the users. (I'm always amazed at how unusable the websites of my local TV stations are with the adblocker off. Popups and blinking things everywhere, and the page just keeps changing out from under me.)
It's not a question of whether or not they can exist. It's a question of whether or not they can be effective enough that the advertisers will pay enough for them to provide enough revenue to keep the site afloat.

Someone reading a print article is probably a much stronger signal that they are interested in things related to the article or the topic of the publication than is someone reading an article on a website.

If some random site runs an article about, say, a meteor shower you probably can't infer that the reader has more than a passing interest in astronomy. Even of a site that is focused on astronomy runs such an article they are still likely to get a lot of casual readers, such as people who heard on the news about an upcoming meteor shower and Googled to find more information.

If on the other hand Sky & Telescope runs an article about a meteor shower in their printed magazine it is probably a safe bet that most people reading that have a fairly serious interest in astronomy.

If I were trying to sell astronomical equipment I'd probably be willing to pay a lot more to run an ad in Sky & Telescope than I'd pay to run the same ad on a "free but with ads" astronomy website and I'd pay even less to run it on some non-astronomy site that happens to be running an astronomy article.

I think this largely invalidates most of the "it worked for print so it will work on the web" arguments I've seen except maybe for websites with well enforced paywalls.

You're trading one evil for another. If your primary revenue source is guitar manufacturers, how likely are you to write an honest guitar review?
The argument is specific to electronic ads.
It isn't.

What the ad industry refers to as "fraud detection" is just abuse of a monitoring system they didn't have access to in print (pay per view/click) - print ads were still deemed effective even without assurance of per-individual revenue. There's no reason electronic ads couldn't've been treated the same - the only difference is advertisers have been allowed to develop surveillance to gain granular revenue insights (and anything abusing that system of surveillance is labelled "fraud")

Ad fraud is only legitimately "fraud" if you accept the pervasive surveillance it's "defrauding" is legitimate to begin with.

I don't see how you figure. If I pay to put an ad in the New York Times, the circulation is not a secret. If I pay to put an ad in some Web site (or, worse, thousands of Web sites I don't know about in advance) then I have no insight whatsoever into how many impressions I'm getting if we decide it's illegitimate to try and measure traffic.
You can measure traffic anonymously or pseudonymously without violating GDPR. Monitoring of traffic for a website owner is inarguably legitimate interest if even just for DOS protection. The tracking discussed in this article is not about traffic measurement, it's much deeper individual tracking.

Also...

> the circulation is not a secret

Isn't it? Like yes, there's a published figure, but is it verifiable?

If we're discussing potential for "fraud" here, I don't really see how there's any difference between online and print circulation.

For print circulation, there are two choices: the publisher can report their actual numbers or they can participate in fraud (by lying about it). The latter has real legal consequences attached to it. They might bet on never being found out; I am guessing that most do not.

For online "circulation", there are three choices: the two given above, plus the possibility that the "actual numbers" (e.g. generated from server logs) do not reflect what they appear to (ie. bot visits). This is "problem" that tracking seeks to fix, by avoiding a "circulation data source" (page visits) that isn't (and cannot be) reliable.

Bot visits need incentives. There's two:

1. the current incentive where individual brands can defraud ad exchanges' pay-per-x systems. Without pay-per-x this incentive disappears.

2. publishers defrauding advertisers. This has similar cost & risk ratios online as either misreporting numbers or bulk-buying papers does in real life. There's also very little tangible difference between the ability of authorities or legal agents to enforce honest reporting of numbers online and in print. The two scenarios are eminently comparable.

Ultimately, removing pay-per-x brings online ads and the ability to defraud advertisers down to a level of equivalence with print.

Not only that. I’m potentially publishing across many sites I haven’t verified. A well known newspaper seems unlikely to engage in outright fraud, but someone I don’t even know, why should I trust them?
> Monitoring of traffic for a website owner is inarguably legitimate interest if even just for DOS protection.

Even in cases where the GDPR allows data collection for one purpose, that does not mean you can apply your collected data or analysis for a different purpose.

IANAL but I don't think that's what's happening here: the gp was referring to circulation figures. DOS-protective measures need insight on individual bad actors but only derived aggregate figures are needed for circulation. That's not something covered by GDPR in any way - it's extremely explicit in defining what types of data points relating to "natural persons" it covers.