| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rickdeveloper 1670 days ago

I built this website a couple of months ago because I was annoyed by how hard it was to find useful things on Google. As "Google no longer producing high quality search results in significant categories" [0] is currently #1 on the front page I figured I'd share this project again. I hope it's useful to some people.

'No Trash Search' is very focussed on STEM and not "for daily use". It's surprisingly good when you're looking for certain kinds of information. Under the hood it's little more than a programmable search engine [1] with a whitelist of ~120 sites.

[0] https://news.ycombinator.com/item?id=29772136

[1] http://programmablesearchengine.google.com

6 comments

throwawayboise 1670 days ago

> Under the hood it's little more than a programmable search engine [1] with a whitelist of ~120 sites

So back to what web search was in the 1990s, roughly: an index from a curated selection of sites.

rdiddly 1670 days ago

120 sites is pretty hilarious and sad. "Here you go, the worthwhile part of the internet!"

BlueTemplar 1670 days ago

While I can understand the appeal, restricting your search engine to only ~120 websites out of hundreds of millions (?) is basically giving up on the Web.

(BTW, any good search engines these days that aren't indirectly using Google or Bing ?)

version_five 1670 days ago

> restricting your search engine to only ~120 websites out of hundreds of millions (?) is basically giving up on the Web.

Sure - the web is now a cesspool optimized for advertising and attention. The traditional search engines made a lot more sense at the dawn of the internet when it was more about discovery. Now, for the most part, it's closer to an information retrieval tool, where a finite list of established sites have the bulk of what one is looking for. It only makes sense to have a tool that lets one navigate the established, legit internet, and not have to deal with all the crap.

That doesn't mean there is no use case for google as it is, but some more focused competition is a no brainer.

narrator 1670 days ago

There's http://yandex.com . It's great if you want to search controversial subject matter and controversial results that Google wouldn't give you. The reverse image search is also amazing.

imglorp 1669 days ago

The reverse image search in particular is very, very good.

Far better than Bing or Google. It's not obvious why theirs is so terrible, unless that product is not a moneymaker for them, in which case it explains everything.

BlueTemplar 1669 days ago

I should have mentioned : ideally from the EU.

Big Russian or Chinese software is even more out of the question than the GAFAMs (if they're big, they definitely have authorities messing with the results).

Hmm, what about Baltic or Ukrainian or Israeli search engines ?

quocanh 1670 days ago

Which results are different than Google's?

jhugo 1670 days ago

Most. Yandex is great, especially for programming searches. It generally ranks GitHub, Stack Overflow and other content-heavy sites highly. Google has been taken over by weird clones of GitHub and SO lately, Yandex has no such trash.

It completely boggles my mind that the useless GitHub and SO clones rank first page on Google. Do engineers at Google not use their own product?

skinkestek 1669 days ago

Regarding stackoverflow there is a fair chance they can congratulate themselves:

If I am right they played stupid games and won stupid prizes. More specifically they have allowed rampant deletionism for years so while I am fairly certain the questions and answers originated on Stack Overflow it wouldn't surprise me if a good number of of those aren't visible on Stack Overflow anymore which would explain why they rank higher in Google.

Done right this would actually be a service.

Sadly some of them seems to mix together various questions and answers in the same page to generate text matches for unusual queries.

ramphastidae 1669 days ago

Engineers at Google build what the ads and sales teams tell them to.

msrenee 1669 days ago

Don't have time to mess with it right now, but does it normally return about half results in Russian or is that something my phone/browser is doing?

jhugo 1669 days ago

I get usually about 10-25% in Russian.

hiptobecubic 1669 days ago

Frankly, no. It's kind of a running joke that you can't Google any of your problems at Google because everything is internal.

imglorp 1669 days ago

> Google has been taken over by weird clones of GitHub and SO lately

Do you have an example search leading to a GitHub clone?

vgalin 1669 days ago

French is my mother tongue, but I've quickly learned during my studies that using English keywords in my STEM-related searches would simply lead me to better (and more abundant) results.

A few weeks/months ago however, while I was trying to solve an issue whith a colleague who would search using french keywords, I noticed that some websites featured on the first page of the Google results were off.

In short, they were machine-translated versions of Stack Overflow threads. And they would appear in most of the searches using french keywords.

Those websites also appeared rarely in my searches while I was using English keywords, but most of the time I never bothered opening them. But now I notice them every time.

Some examples: When searching for "wget set http proxy" on Google, the fourth result leads me to qastack.fr, and the ninth to it-swarm-fr.com, both are websites featuring scrapped and machine-translated threads from Stack Overflow.

When searching deliberately in french for "Eclipse CDT stdout ne s'affiche pas" ("Eclipse CDT stdout not displayed [in console]"), the first result leads me to askcodez.com and the fourth one to qastack.fr (askodez is the same as the other two).

I have never stumbled upon Github clones, yet, however.

jhugo 1669 days ago

I don't have an example search, although I'll try to remember to update this comment the next time it happens. On average I come across these things at least once a day, but it depends what I'm working on. It tends to be when searching for more obscure bugs, for which there is a GitHub issue but it's not ranked highly on Google for whatever reason, but these spam sites are ranked highly.

GitMemory is probably the most well-known example; it's just a thin layer over the GitHub API with a completely garbage UI, yet it often ranks higher than GitHub itself.

cpach 1669 days ago

Try searching for movie name + torrent for example

beckman466 1669 days ago

yep it's always a bunch of movie subscription sites instead of the torrent. it's almost like Google's search engine is predominantly focused on collecting advertising dollars...?

1vuio0pswjnm7 1670 days ago

"(BTW, any good search engines these days that aren't indirectly using Google or Bing ?)"

The code for Gigablast is open-source, including the crawler.

I could be wrong but I do not think search.marginalia.eu nor wiby.me use Google or Bing.

The comment about "hundreds of millions" is interesting. Assume hypothetically a search engline claimed to be searching millions of sites for a given query but in truth it was actually only searching 120 sites that it had determined answered this query (i.e., was the most popular answer source) for the majority of users. How would a user verify the search engine's claim about searching millions of sites was true. What if the search engine only allowed the user to retrieve a maxmimum of about 230 results, not matter how many sites it claimed to search.

jerf 1669 days ago

"How would a user verify the search engine's claim about searching millions of sites was true."

Search for things specifically on those pages, by very specific phrases and such.

Of course you have to find them yourself first for that verification.

I can say having set up some very teeny tiny websites here and there that the googlebot is hooked up to a lot of stuff. I'm not even sure how it found a couple of them as quickly as it did. Things like "if someone adds an RSS feed to Feed.ly" seem to do the trick. None of them were sites trying to "hide" or anything and I expected them to be found eventually, but they got found much faster than I expected. Or maybe they just scan new domain registrations, though it seemed to me it wasn't that that triggered it.

1vuio0pswjnm7 1669 days ago

Imagine searching for something that is quite common that will produce a large number of results but the user can only retrieve, say, 230 results total. How does the user verify that all of the "millions of sites" that contain results were actually searched when the user submitted her query.

A search engine can tell users some large number of sites were searched at the time of the user's query and some large number of results exist, but what if it does not allow the user to actually view all the results.

To put it another way, the question is not what Google has discovered about the www,^1 but what Google is willing to let the user search and retrieve. If retrieving the 963rd result for a common string is not allowed, then it is impossible for the user to verify that the site containing that result was searched when the user submitted her query. Even if the search produced a 963rd result, what difference does it make if the user cannot retrieve it. What is the point of the search engine locating the 963rd result if it never has to show this result to the user querying a common string.

1. What Google has discovered about the www^2 and what Google users are able to discover about the www through Google may be two different things.^3 Google has its own interests to pursue in the name of online advertising and these may conflict with users' interests. "Censorship" is one concept that often draws negative connotations but there are many more subtle forms of filtering and manipulation that are possible here, including unintentional ones.

2. The most important focus would be what is "popular".

3. Some users might care less about what is "popular". Such users would, by and large, be less interesting to an advertising company. Individual interests might become subverted in favour of "popular" interests, to the extent they conflict. An advertising company (that runs a search engine) will favour the larger audience.

imachine1980_ 1670 days ago

Gigablast resource tend to be full of trash in my short experience whit it

1vuio0pswjnm7 1670 days ago

All the search engines have trash. I retrieve results from a variety of search engines and mix them into a simplified SERP with zero cruft that can be read very quickly. Some call searching multiple search engines "meta-search". The main differences with mine is 1. it is all done client side (there is no remote "meta-search" engine) and 2. searches can be "continued" where they left off at any time. This allows one to avoid rate limits. There are always trash results, every search engine has them in their SERPs, but I find that the more results and the more varied the results the better the chance of finding useful, non-trash ones. Gigablast allows returning at least 100 results at a time. Few search engines allow 100 results at a time that anymore. Google still allows it but will not allow a user to retrieve more than 200-something results total.

ColinHayhurst 1670 days ago

Try Mojeek https://blog.mojeek.com/2021/03/to-track-or-not-to-track.htm... Disclosure: team member. Feedback good or bad appreciated

karlzt 1669 days ago

Feedback:

https://news.ycombinator.com/item?id=29786112

fuckcensorship 1670 days ago

Check out marginalia[1], made by another user on HN.

[1]: https://search.marginalia.nu/

marginalia_nu 1670 days ago

Yeah I do my own crawling, and offer results from around 200k sites (although it's indexed 700k domains, most of which are crap).

blaerk 1670 days ago

I think https://www.qwant.com/ use their own, just started using it so I can't really say much about it other than it seems alright compared to ddg and google(?)

BlueTemplar 1669 days ago

Last time I checked, it just used an old index from Bing ?

dataflow 1670 days ago

You might want to add cppreference.com to your list of programming sites.

lionkor 1668 days ago

Seems to be in there now :)

SilasX 1669 days ago

FYI, I think this is just the case where you should prefix the submission title with “Show HN:”. Can mods update it so it shows with the others? @dang?

https://news.ycombinator.com/show

https://news.ycombinator.com/showhn.html

pronoiac 1669 days ago

I emailed this suggestion to the mods.

SilasX 1669 days ago

Woot! It's updated now. Thanks!

DantesKite 1670 days ago

Hey I was looking for something like this. Thanks.