Hacker News new | ask | show | jobs
by corysama 635 days ago
I’ve heard that a while back Google had a change to their algo that heavily prioritized widely used websites as “trusted”. The very most well known sites in the world, such as cnn.com, would be treated as the best results for anything they contained.

In response, many of the most used web sites flooded their own sites with transparently fake product reviews full of SEO phrases about “we spent N weeks testing K products to root out the very best” and very little else. The actual reviews would be pretty much copy-pasted from the description provided on the product producer’s site.

And, that’s how Google made itself useless for finding product reviews.

5 comments

The above poster speaks to the crux of the issue. CNN, Forbes and other sites are doing things that a normal webmaster could be nuked from orbit for, after a "manual review". Yet, these are the manually curated sites which Google claims have high trust signals.

There are a few disparate incentives. One is a political desire to buttress the "official truths" of the legacy media. The other is a market incentive for the dying legacy media sites to earn revenue.

There is a third, related market incentive for the dissatisfied media consumers. CNN isn't as compelling as it was two decades ago. Eyeballs and ears are naturally straying towards the perceived value of alternative media sources. Therefore, to continue the ancien regime, it becomes necessary for Google to prop up CNN and others.

There is a possible world where Google creates value by indexing and sorting through a decentralized and open Internet. This chain of events does not support that. The trend is for gatekeepers to panic. The search results have been sabotaged as a result.

Is Google more valuable as a gatekeeper for established institutions? Can that amount to more value than the potential ad revenue of a larger web? Time will tell.

I feel gaslighted when "fake news" is brought up by people, that consider CNN etc to be somewhat reputable. Like, the atmosphere on those sites are surreal nowadays. There are fake news, fake ads etc.

Like a article with a picture of some fat person, a picture of a child star, and the headline "you can't believe how child star whatever looks like today!". But the fat person is not the former child and nowhere to be seen in the article, etc.

Those sites are as reputable as pornsites.

These are the same sites that platform the doom-mongers warning us of the dangers of "disinformation" and appealing for censorship. The viewers may merely be victims of the propaganda. Those disseminating the propaganda should be held to a higher standard.
There is this establishment push to push news down our throats as people turn off their outlets.

Like algorithmic feeds in general. And e.g. the Windows startmenu and what not.

Tradionally people chose what news publishers to follow. Like, it was one newspaper that you liked politically or journalisticly.

News used to be opt-in and you chose your affiliation. The angle of the publisher was known.

Nowadays it is a joke. A washed out joke. You need to go for the more radical political papers to get what most papers used to be in the 90s.

> I’ve heard that a while back Google had a change to their algo that heavily prioritized widely used websites as “trusted”.

They super-obviously did that some time around ‘08 or ‘09. Basically just gave up on the cat-and-mouse game they’d played with spam for years, based on actual content and (what they hoped they managed to suss out as) organic linking, and switched to heavy reputation- and size-weighting instead. It was a giant shift in their search’s behavior and unlike anything they’d done before, not subtle at all.

This is an interesting claim. What evidence do you have for it? I don't mean this argumentatively, but rather that I'd like to read more and learn for myself.
Before Google, unscrupulous web sites would try to SEO themselves into the top page of search results with repeated META tag bombs or sometimes just good old fashioned whitefonting. One of the innovations of PageRank was that more widely linked-to web sites would be ranked as more authoritative, doing an end run around the kind of keyword spam that plagued the early web. If the most widely linked-to web sites wish to play ball with SEO marketroids, that undermines the trustworthiness that PageRank assumes for those sites.

The upshot of this is that no system is impossible to game.

What signals do you think Google should be using instead?

I presume they made the change because their search results were filling up with blogspam, and there was no algorithm that could detect a high-quality review from a spam one.

So what do you think would have been the right approach?

People pointing out problems are not obligated to provide solutions. I don't know where this idea comes from, but it's just wrong.

If there was a solution to this problem from the search engine's point of view 5 years ago, which I do not stipulate but let's roll with it, there isn't one now. ChatGPT can overcome basically all detection techniques when combined with the current amount of efforts already largely successfully avoiding detection, and it will continue to get better. There are no signals for random unattested web content that will separate what we want from stuff constructed to look like what we want but with embedded motivations or content we don't.

A web of trust may be inevitable, but it's not like that can't be attacked either, especially past the first hop. It seems inevitable that slowly but very surely our trust is going to get pulled in much, much more tightly than it is now. I don't see much that can be done about that, even in theory. It was a historical accident that we ever could trust random websites to not be 100% focused on their own interests, simply because the tech to do that wasn't there yet. Now it is, and we will be entering a world where we can not trust any free resources, whether we like it or not.

> People pointing out problems are not obligated to provide solutions.

And nobody said they were obligated to. So I don't know what you think you're responding to.

I assume it's OK to ask people what they think a solution should be, though?

Seems like a pretty natural, conversational follow-up, if you ask me.

Presumably if you know a situation well enough to criticize, you have at least some ideas of what alternatives might or might not be better. Or can elucidate why you think there might not be any better ones.

Or do you think the entire act of asking questions is "just wrong", to use your phrase?

It is an extremely common tactic used to shut down conversations about problems. If that wasn't what you were doing, I apologize to you for being wrong this time, but I don't apologize for making the mistake in the first place, because it's fairly well-founded based on extensive experience.
> a world where we can not trust any free resources

Or paid ones, really. If you think a company is trustworthy, that means a) you believe it cares about losing you as a customer, or b) you believe the company has the obligation or the luxury of acting with integrity (or the people working there do).

Especially with news media, none of these things are likely to be true. For paid news I’d just expect less typos but not more integrity.

It's really easy to find real reviews. The magic trick is -affiliate -amazon. You can add other qualifiers as well.

Try it: https://www.google.com/search?q=macbook+m3+pro+review+-affil...

Reviews financed via affiliate links are just camouflaged ads. So Google should offer a filter to remove all of them.

I add similar qualifiers to almost all of my searches. They make the web feel like it's 2010 again.

In general, for most searches I just want to filter out ad-supported sites completely (including affiliate ads but also all ohers). Those will always have misaligned incentives.

Of course Google will never provide this because their incentives are also misaligned, being the biggest online advertising company.

Awesome tip. I made a Kagi lens with these settings:

https://kagi.com/lenses/0MqOTt5t5MajrIkHAqHEgDeoKzF1a4TS

(Can’t share example results since Kagi doesn’t let you share results from lenses)

The results would only be notable if they were substantially better than the Google results.
Doesn't Kagi currently pay to access Google's index?
I think so, but they do rerank the results. The only major engines that run their own indexes are Google, Bing, Yandex, and Baidu.
EDIT: If Google decides to ever remove this useful feature as well, here's an archive link showing what the results used to look like at the time of posting: https://archive.is/5KwA6.
That seems to be a great tip. Thanks.
I don’t use Google, but I used to pay for Apple News.

Apple uses algorithmic ranking by story, and pays news sites by article views. It is basically all spam. If you block the spam sites, their stories still show up in your feed with a note that you blocked the site.

Instead, they should let people structure their feeds by news organization, like podcast apps do. They should steer you back to reading the sources you’ve opted into, and mix in a bit of stories from related news organizations, not stories with high content similarity, or high “trending” scores.

(As far as I know, Apple News+ is the only product still operating in the paid news aggregator space, but if there’s another one, I’d love to hear about it.)

I've been using Feedly for a bit now after something changed with the Google aggregator that Android has available as an option on the home screen changed something and became impossible for me to filter out certain sources from (maybe related to the engine changes discussed in this thread and in the article?)

It's solidly...okay. It's very good aggregating everything I want, and for the most part it's able to avoid things that I'd absolutely not be willing to overlook, but it has some quirks in terms of the filters weirdly not working for me on fairly benign topics (no matter how much I try, I can't get it to stop showing me content from various sports like soccer, basketball, and golf despite the only sport I care about being baseball). They seem to really hype their AI features in the app, which is a little weird because I don't care how they aggregate behind the scenes and they shouldn't need AI to be able to filter articles they literally already tag as "golf" when I have "golf" listed in my filters as "never show", but it's not annoying enough that I've bothered trying to find an alternative yet.

I have to say showing you content from blocked channels is the most user hostile thing I encounter on a daily basis.

The contempt for one’s users is such a defining feature of this era of late-stage tech.

> Instead, they should let people structure their feeds by news organization

Doesn't this immediately turn into the kind of problem TFA is bemoaning? Once a news organization gets traction (opt-ins in this case) on a platform, they'll inevitably start selling space in their feed to one or more crappy aggregators. To the C-suite this looks like free money, since somehow they always manage to convince themselves that the brand damage from it will be minimal or at least manageable.

It sucks.

> If you block the spam sites, their stories still show up in your feed with a note that you blocked the site.

Users' respect for Apple is matched in magnitude by Apple's disrespect for users.

IMO the internet is just a bad place to look for reviews nowadays, unless you really trust someone and know they aren’t being paid to review the product. Likewise Amazon reviews I consider mostly fake. For products I want to buy I look at what brick and mortar stores sell, they have skin in the game and can weed out the truly bad.
DoubleClick slowly killed Google search because the best way to make money in display ads is to run clickbait.

In the one hand, Google paid good quality websites more money for trash content and engagement bait than quality content. So they adapted to that new market reality.

Meanwhile, the real money maker - Search - gradually got filled up with lower quality content and now it’s imploding.

Google buying DoubleClick has a lot of parallels to what happened with Boeing.

> was no algorithm that could detect a high-quality review from a spam one

In that scenario, the search engine could show an empty page plus their screened ad network results.

Perhaps a link for querying Reddit or other social media.

For the most profitable/contested review queries, some combination of algo and paid humans for feedback/curation.

> there was no algorithm that could detect a high-quality review from a spam one.

I really hope for them it does exist because otherwise Google is screwed.

The right approach would’ve looked something like what the author of this article did. None of it was that technically complicated.
No ads or referral links should be the most important signal.
These days I ask chatgpt what the people of reddit think.
> we spent N weeks testing K products to root out the very best

Does the Wirecutter no longer actually do the leg work?

Whether they do or not, Wirecutter was such a successful format that everybody else copies the style when when writing fake reviews. The giveaway is when every item in a category happens to be the best at something that could be read off the spec sheet and they never actually recommend one: This one has the best sound quality, this one is the budget pick, this one is best for people with cats, this one has more battery life, etc.