Hacker News new | ask | show | jobs
by ajmurmann 1384 days ago
That's where I continue to want to see what results in a search engine would look like that heavily punishes presence of advertisements in a result. All the SEO spam pages are ad-driven, so cutting out anything following that incentive should result in removal of all pages that follow that terrible SEO spam pattern that ruins search results.
6 comments

Punishment/vengeance is a popular idea around here, but you have to also remember that a search engine is supposed to bring you the most relevant results.

Filtering out, say, Stack Overflow or Reddit because it has ads doesn't help you when it answers your question and is perhaps the only thing on the internet that truly does.

People seem to think there's this ad-less replica of the internet, sitting right behind our ad-riddled internet, where everything they want exists for free, it's just hidden. In reality, the websites making money are the ones providing the vast majority of things people are looking for.

Use https://search.marginalia.nu if you want to severely punish ads.

Maybe instead of heavily punishing websites with ads, a search engine could instead punish heavily ad-driven websites. A lot of the SEO-exploiting blog mills are filled to the brim with ads where the goal is to get you to visit to view as many ads as possible, not provide good content that's funded by ads.
Some sort of ratio algorithm would be nice.

Does this site (in general, not just this page) have more than 5 advertisements per page? Between 2 and 4?

Does this site attempt to load 12 trackers? "Only" 4 trackers? Just 1?

Does an AI text analysis of the first few paragraphs match on this nonsense?:

> Fixing your gadget is important. Many people find that their gadget sometimes breaks. Gadget helps us do action easier, and improves our lives. We all hate it when our Gadget doesn't work the way we expect it to. It can be frustrating. Read below for tips on how to fix your gadget. (Followed by 3 more paragraphs of filler before getting to regurgitated gems like "reboot it".

I'm sure we have the AI tech now to semantically see this bullshit and downrank it. Right? (Ok, maybe I overestimate how easy this would be. Forgive me, I'm just ranting here)

Do you mean the advertising company that runs a search engine should punish pages in the results that... show their ads? Or just when it's a "lot" of their ads? Or should they only do that if the pages are showing ads from their competitors?
I’m honestly surprised that Google thinks that a page with N ads deserves N times the CPM. The more ads, the less attention each ad can grab, no? I wonder whether just treating ads as zero sum (regardless of ad provider) — such that a page with 5 ads, 2 of which are Google ads, gets a payout of 2/5 the CPM of a page with one Google ad and no other ads — would basically drive all these SEO mills out of business. While also not really impacting honest ad-sponsored sites (like Reddit), that only tend to run one ad per page.
I mean when it's a lot of ads. It could perhaps be an ad to content ratio? It seems SEO spam contains more ads than content most of the time.
Or maybe the search engine shouldn't care about ads at all, and just figure out what is good content and what is bad content, and what actually answers queries well.
When I said "punish", I meant that the ranking algorithm should do that. It's not about vengeance, it's about filtering out SEO spam. The problem with filtering out SEO spam by detecting it as such is that it's by definition an arms race. That's why I propose to instead of looking for the symptom (SEO spam) pull it out at the wrong incentive structure that's causing it (ads).
"that heavily punishes presence of advertisements in a result." while that is pleasing to read at face value, it has two fundamental problems:

1. it's orthogonal to relevance of content (semi-solvable algorithmically I suspect) 2. it's antithetical to Google's core business model (a lot tougher nut to crack)

> 1. it's orthogonal to relevance of content

The entire point of my comment was that it's not orthogonal. The ads are what fuels the click bait and SEO-driven articles. Nobody for example would ever pay a subscription to a website that is just waffle filler. While stackoerflow has ads, it's much better in that regard to the SEO spam pages.

Aren't you contradicting yourself there? Stack Overflow is ad-supported, but is good. But you want search engines to penalize sites that have ads?

I hate ads, but I don't think we should be focusing on them here. Some sites that have ads have garbage content, and some sites that have ads have useful content. Just... find the useful content, and return it in search results. I know "just" is doing an awful lot of heavy lifting there, but I don't think "has/does not have ads" is as important a signal to a search engine's algorithm as you think it is.

> it's orthogonal to relevance of content

I disagree. The way content is presented matters. Splitting an article into 4-6 pages and filling those pages with ads makes me not want to read that content. I'd much rather go somewhere that has the same text in a single page and only a few ads.

What about playing 2-3 unskippable video ads before watching the actual content? Thus, Google should degrade YouTube search results as well!
The ideal search engine would show me the ad-free page first given otherwise identical quality. Of course Google will never do anything like that. That's why I'm hoping for an alternative search engine to do so.
Teclis implements a fun approximation of this. It runs uBlock Origin on results and penalizes according to the number of items blocked.
Imagine a world where the biggest search engine made its money from advertising. In that kind of a world, wouldn’t the search engine primarily be incentivized to show you the results pages with the most advertising, regardless of the quality of the content?
No, because people would stop using the crappy search engine.

That's how the world was before Google.

And that differs from today because.... why?
Anyone who wants attention is motivated to do SEO. Should engines downrank every site that has good SEO? That is, downrank every site that ranks highly?

They already look at things like clickthroughand dwell time and bounce back. If enough people dislike Example.com enough to avoid clicking on it or come back to search after visiting it, the engine learns that it is a bad result.

Maybe the problem is that most people like what you don't like.

No, they key is to differentiate SEO'd pages with useful content from SEO'd pages with useless content.

This is a game as old as search engines. In 2005, it meant filtering out sites that were just lists of keywords, not coherent sentences and paragraphs. It meant for giving extra points to articles with structure, such as header tags and paragraphs, as opposed to just blobs of text. It meant using PageRank to organically discover which pages real people thought were useful.

It's a much subtler and more difficult problem in 2022, but there are also better tools to do it (big NLG models). It just seems that Google lost interest in quality control at some point.

And I would guess they lost interest in quality control because of Chrome's market penetration. Chrome is a browser monopoly at this point, and with Google being the default search engine on Chrome, they no longer have to give quality results to maintain their search user base. On top of that, they control such a large share of the ad market that any SEO spam website is more likely than not to be using AdSense. Which means they have a financial incentive to deliver page views to SEO spam sites, which tend to have higher ad/content ratios.

This seems like a good fit for a ! solution like duckduckgo. !gov !universities These may already exist.
That stuff definitely helps. That's also why do many now just search Reddit. However, wouldn't it be nice if the search engine could be smart enough to figure that out itself?
The problem is that people clicking+dwelling on something is not highly correlated with it serving their needs.

See: clickbait YouTube videos that show you something you really want to see in the thumbnail, then spend 10 minutes doing something else before showing it, and when you see it it’s a tiny aside with no more context than what you got in the thumbnail. If it’s even in the video at all.

Those videos have both high clickthrough (thus “click bait”) and also high dwell time (from people waiting for the thing they wanted to see to show up.) They do also have high bounceback, but only from people who recognize what’s going on. “A new sucker’s born every minute”, and those suckers will click the video and watch it, because they don’t yet know the principle that this specific kind of enticing thumbnail+title format implies that they won’t find what they want here.

These metrics all measure, effectively, “wanting” rather than “having.” It’s like measuring food by how addictive it is, rather than by how satisfied it makes you. You’ll end up optimizing toward cheetos — literally flavoured air — rather than toward anything that fills your stomach. People might enjoy cheetos while they’re eating them, but if they’re genuinely hungry, cheetos won’t solve their problem — they’ll still be hungry afterward.

This is so simple its easy to overlook the fact that its also ingenious