Hacker News new | ask | show | jobs
by bko 714 days ago
Whenever I hear about alternative search engines, I try out a few famous people hoping to see Wikipedia entries towards the top. And almost always I see nonsense.

For instance, if you search for 'Trump', the top links are

```

1. http://www.trump.de — found via Mwmbl -- Trump

2. https://itep.org/md/ — found via Mwmbl -- Trump Tax Proposals Would Provide Richest One Percent in Maryland with 69.7 Percent of the State’s Tax Cuts Earlier this year, the Trump administration r…

3. https://is.gd/mUHYTg — found via Mwmbl --- Trump embraces QAnon conspiracy because ‘they like me’ After skirting the issue for weeks, President Donald Trump offered an embrace Wednesday of the fri…

4. http://dict.cn/trump — found via Mwmbl -- trump是什么意思_trump在线翻译_英语_读音_用法_例句_海词词典

```

Surely there are millions of results more relevant to the phrase 'Trump' than trump.de. The other links aren't better. A random article from 2017? Another one from 2020. A Chinese dictionary definition of 'Trump'?

I get that search is hard, but what's going on here? You can try any phrase, and you just get weird results.

2 comments

> but what's going on here?

I'm wondering the same thing. Google gives me _exactly_ what I want without me having to add keywords or cajole it. All of these other search engines give me such weird irrelevant results. If I search "python reverse string" on YaCy's demo peer, the third result is the ArchWiki page on ... MATLAB.

I really wish I knew what to do to help the situation here because distributed p2p search engines seem so cool. But then again, Google wouldn't be so dominant if it were so easy.

> I'm wondering the same thing.

Well, if you really want to know, you could try taking the HTTP responses for the page you expect to be highly ranked, and the page that's actually highly-ranked, and applying various common ranking heuristics to them, to figure out what the result-ranking algorithm is actually doing.

For any search engine who hasn't had a bunch of competitive pressure forcing them to improve, the ranking algorithm is very likely something incredibly simple and standard — e.g. tf-idf across the whole HTTP-result corpus.

So I'd guess that the results you tend to see in your tests, are because one of those "standard" algorithms ends up doing something dumb for the ranking pairs you care about.

Yeah the ranking algorithm is just wrong right now. I'm working on it.