Hacker News new | ask | show | jobs
by mft_ 788 days ago
I've been through this thought-process many times.

1. Google isn't working well any more.

2. Therefore bring humans back into the system of flagging good and bad pages.

3. But the internet is too big - so we have to distribute the workload.

4. Oh, a distributed trust-based system at scale... it's going to be game-able by people with a financial incentive.

5. Forget it.

---

Edit: it's probably worth adding that whoever can solve the underlying problem of trust on the internet -- as in, you're definitely a human, and supported by this system I will award you a level of trust -- could be the next Google. :)

8 comments

> 1. Google isn't working well any more.

This is so true. It’s pain to search for anything undeterministic with nowadays. I usually find myself putting double quotes on every single word I’m interested in and Google still brings unrelated results.

I'm curious. Why do you use Google if you don't like their results?
PageRank isn't working well anymore. I use DDG, but it its quality is also flagging.
Yeah, DDG quality seems abysmal for many of my searches now. I'll then switch to Brave, which sometimes finds what I'm looking for. I rarely ever check Google or Bing as a fallback, but when I do it feels like a screenful of ads and it isn't any more helpful (except Google Image search).

Part of the problem seems like a recency bias in search results. I notice sites frequently update pages with new timestamps, but nothing of substance appears to have changed (e.g. a review of something that was released 4 years ago, but the page was supposedly updated last week). So if I do pretty much the same search that succeeded two months ago, but repeat it today, I might not find the useful result I remember coming across.

I'm sure there are a bunch of other issues related to search and SEO that are affecting search quality. It seems insane that the major search providers don't combat this trend by arming users with more tools to tailor their search, but rather steadily degrade the user experience with no recourse.

Honestly, I think if Google was wise, they'd have a skunkworks team rethinking search from scratch (not tied to AI/LLMs) that starts their own index and tries to come up with an alternative to the current Google Search. Maybe they have that already, though I doubt it. I'm sure if they do have such a team, it's intricately tied to existing infrastructure and team hierarchies which effectively nullifies any chance it has at success.

I would be happy with an even simpler solution. Give me a blacklist domains or sites. There are 50 or so sites that I never want to see posts from. It’s not one or two that I can add an exclusion in my searches.

Second, give me a way to express semantic meaning of something. If I’m searching for rust, let me choose programming language for example. I find myself adding various one word tags to limit the search results.

Personally I have one use case left for Google: Product Search.

If I'm looking to buy something, I will frequently end up using Google. The engine that matches my product search to a relevant ad is excellent. Basically anytime you need to search for something that could lead to a purchase of a physical product, Google will be extremely useful. For services and software you can't use Google, you will get hustled by fake review or top 10 XYZ for 2024 sites.

I can't believe how terrible product search is on Instacart - I place orders on there pretty frequently for my mom, and Petco is the worst.

I will search for "wellness chicken cat food" - and wellness has chicken cat food in a few different textures, so it seems like those should at least be on the page of search results, if not the top results. Not always so! At the very least I will have to scroll a ways down the page to get anything even wellness.

And sometimes the top results aren't even cat food, they will be random other pet supplies.

Or she wants a few different flavors of the food, and I find one and then the other flavors I have to search a few different ways to pull them up and they don't show up on any "similar" displays.

It's painful. I hope Google doesn't go the same way - I think with Instacart it's because they want to promote whatever it is they put at the top, but even that doesn't explain how terrible some of the search results are.

I just think wellness aside you should get other food than cat food for your mom.
Meijer (grocery store) is the same. Until a short while ago, you even had to match the capitalization of the product you were searching for. And we are not even talking about different writings of the same Unicode characters (ê etc.) with different bytes.
I use it for deterministic things. If I’m looking for something specific or a well known thing, it simply gets me the results I’m looking for.

For anything that’s indeterministic, it just brings me garbage. The same with YouTube as well. I’m searching for a specific thing about a particular library and it’s showing me stupid definition blogs or useless garbage. I apologize for my crudeness but no other words can describe it.

Have you tried verbatim mode? I use it by default.
Could you explain how to use verbatim mode please? For anyone else reading in the future as well.
Google used to work closer to verbatim mode but would get common synonyms to give you comprehensive results. Now it stretches synonyms and alternate spellings to the point of uselessness.
They also changed "did you mean Y" into "showing results for Y". So annoying.
Try using kagi. The results are so much better.
I think the solution to this is both unique and trivial: you cannot trust something that is freely expandable,or did not require some amount of stake from the other party. That stake can be anything, time, work or money.

If you want to trust a review, it's needs to have required a non expandable resource from the reviewer. That amount of resource should be an optimum of what an average user would be willing to expand without missing it (so that barrier of review is low), while being prohibitively expansive if an actor want to cheat the system and generate millions of reviews.

I like your thinking, but there's a middle ground before full automation: when humans are incentivized, one way or another, to provide the biased reviews. This might be via straight-forward employment of people in lower-cost places (e.g. via Mechanical Turk) or other incentives. For example, note how a proportion of Amazon reviews are gamed and unreliable.

At the moment, the only tasks (that I can think of) that come close to the 'time-consuming-enough to not scale, but not quite annoying enough to put off committed individuals' are the various forms of CAPTCHA - which is unsurprising, given that we're discussing a form of Completely Automated Public Turing test to tell Computers and Humans Apart. (And of course, there are CAPTCHA-solving farms.)

But would people invest time in a review system that required them to complete a form of CAPTCHA regularly?

> That stake can be anything, time, work or money.

I think you'll find that money should be removed from your list. There are some untrustworthy people that have tons of money. Sadly, I think trust must be EARNED, and that requires giving effort (work) and time. You cannot buy trust with simple money.

> You cannot buy trust with simple money.

yet rich ppl are granted more upfront trust. maybe because we assume less incentive to rob.

This clearly isn't true, since if you dress and make a rich person smell homeless, they simply won't be trusted in most parts of civilization.

No, what you're talking about is POWER and AGENCY. Rich people have the power to override trust through the fact that they can operate with near impunity; so you have very little agency to not trust them. If you choose to not show trust, you may invoke their wrath.

Then web of trust. Means SSO (as a way to link the review to the trust).

In order to prevent hacking trust the SSO again must ensure:

- unique human, or

- resource spend

> unique human

Here's the thing. A sovereign nation can "generate" as many "unique" humans as it wants (via printing "fake" but official identities). No one would be on to them until there were more users than probable people in the country.

Doesn't stop nation state troll armies though
How about Wikipedia's approach?

Of course Wikipedia is way smaller than the internet, but still one way to go could be by having themed "human curated niches"

> 3. But the internet is too big - so we have to distribute the workload.

> 4. Oh, a distributed trust-based system at scale... it's going to be game-able by people with a financial incentive.

These are solved by being transparent and surfacing the agent (maybe even the sub-agents) for ranking, and allowing us to choose.

This way, if someone/something is gaming the system, I can just say "this recommender is garbage", and consequently it and all its sub-choices are heavily downranked for me.

This'll make filter bubbles even worse, but that ship has sailed. And I'm sort of a progressive-libertarian-centrist (in the classical sense, not in the American sense). If I get put in a bubble with people who have similar balanced tastes: yes please!

Freenet FMS; more specifically: Web of Trust.

IMO it's how all moderation should go: you subscribe to some default moderators' lists initially and then mutate those subscriptions and their trust levels. Mod actions are simply adding visibility options to content and not actually removing anything.

I think a distributed del.icio.us could work, a p2p version not based on crypto. Something like https://veilid.com would be perfect actually.
The really obvious thing to avoid is DMOZ which got captured by spammers immediately.
Reddit could've been it. This is my default search engine at the moment:

https://www.google.com/search?q=%s+site%3Areddit.com&tbs=qdr...

Obviously the solution is to create a centralized system to electronically ID every human on the planet, and track what they post, talk, think, which medications and food they consume, who they are friends with, who they fuck, how much of this decade's evil chemicals they exhale, where they spend their money, and their real-time location.

Or, you know, just make your own open-source search engine, with blackjack and hookers.