Hacker News new | ask | show | jobs
by bambax 1001 days ago
> when you run into yet another stackoverflow copy

OMG. Why doesn't Google filter out the likes of geeksforgeeks for instance? How is it possible that it always come before the genuine SO answer?

Even without offering the possibility to filter out a domain (which they had, and later removed), how does the ranking algorithm not see those horrible, zero value clones??

3 comments

Misaligned incentives (in corporate terms, $$$).

I can't tell you what they are, but there are probably internal Google incentives to filter and internal Google incentives to not filter, and the ones to not filter are probably stronger.

My theory is that google went from ads in search results to ads on visited pages. By buying doubleclick etc they are suddenly incentivised to drive traffic to ad-supported websites.

Almost all the interesting factual websites are not ad-monetized. The SO spam etc are all scraps of the factual websites with ads injected. If google simply deprioritized ad-supported websites the search results would be much cleaner, but the part of google that sells the ads on sites instead of in search results would throw a fit.

We could test this. Take a few hundred search queries, strip the pages that display Google ads, and see if the remainder of the search result is better or worse.

We'd need to get some humans in to rank the results, but that's not a big problem. "How well does this web page answer this query, on a scale of 1-10?"

With a collection of ranked pages, we can answer other questions as well. I'd be interested in running the same test but for google analytics, not google ads, as I think there might be a misaligned incentive there too.

It's worth bearing in mind that the stackoverflow clones may actually answer the query just as well as the original site - that is, it might be our definition of "a good result" that's out of whack (because we have an unnecessary bias towards the original source). I doubt this, but again it's something that's testable.

Google searches are ranked by humans, it’s a contractor job
I don't doubt it, but obviously something's going wrong between the human-generated training data and the SERP, else why are we getting utter crap back?

(Or, as I said, it's our idea of what constitutes a good result that's wrong).

Aha. This makes a lot of sense for Google.

But the same websites show up in e.g. DDG (through Bing), as far as I know neither DDG nor Microsoft make a dime from ad-supported websites like Google would, why are these results not nuked similarly to what Kagi is doing?

Aha. Couldn't help but scratch my own itch. I wonder if DDG has a deal with Google where they get a cut of the ad profit if they are mentioned as a `ref` in the doubleclick ad request.

:path: /pagead/viewthroughconversion/796001856/?random=1695374589838&cv=11&fst=1695374589838&bg=ffffff&guid=ON&async=1&gtm=45be39k0&u_w=2704&u_h=1756&url=https%3A%2F%2Fwww.geeksforgeeks.org%2Fc-plus-plus%2F &ref=https%3A%2F%2Fduckduckgo.com%2F. <<<< What does this do? &hn=www.googleadservices.com&frm=0&tiba=C%2B%2B%20Programming%20Language%20-%20GeeksforGeeks&auid=68284397.1695374483&data=event%3Dgtag.config&rfmt=3&fmt=4

Hence providing the same incentives to keep shitty sites like geeksforgeeks in the results.

I guess also geeksforgeeks is incentivized to report these references, so that search engines and other linking services will continue to show their links.

To reproduce: 1. Go to duckduckgo.com and do a search that will turn up a geekforgeeks website 2. click on the link 3. watch the network tab as requests are made to googleads.g.doubleclick.net and check the path.

Most other search engines train with a target of google or with some form of reward which is bootstrapped on google rankings. It makes Bing results implicitly have the same behavior as Google. DDG and others just use BingAPI so googles incentives pass on through.
That doesnt make much sense to me. Google's interests are not microsoft's or DDG's interests and to hold up Google as some sort of ground truth in what the optimal search results for a given query are is, as proven by Kagi, highly deluded and also quite subjective.

If true however, it does go to show that Google is really a monopolist in the search space as well... and to substantiate this claim would go a long way into proving that.

These sites exist precisely _because_ of their expertise in the toxic race-to-the-bottom SEO/SEM game that Google created.
What I don’t get is how many people are looking for stackoverflow answers while a)not aware of so copycats and b)not running adblockers
Adblockers are not a defense against this, as those results are genuine search results.

I run uBlock origin (of course), am extremely aware that geeksforgeeks exist and is utter shit, and yet I get fooled now and again, which makes me very angry at that website, Google, myself, and the world in general...

But to make money those sites have to show ads
If I ran a seal-clubbing business I'd have to club seals to make money. The whole argument is that those sites don't exist to provide a good service yet sadly need to show ads to keep the lights on.
I’m just wondering to whom those ads get shown… not arguing that anyone should turn off their adblocker and keep them running

They are working hard to trick people into clicking on their links, but won’t most people who click those links be running an ad blocker? Are unsophisticated web users searching for questions answered on stack overflow?