Hacker News new | ask | show | jobs
by neuralk 3007 days ago
I was wishing for old-school keyword search just this morning! Nice synchronicity.

I must completely disagree with the other posters claiming that keyword searches are not useful. For niche research, they are extremely helpful or even necessary. Google and Bing have reached the point where it is impossible to do real, niche academic research on them. For instance, I had a very specific thing I was trying to look up involving medicine, religion, and Marco Polo.

Try searching for "marco polo doctors" on Google and witness it giving very counterintuitive, one-sided results that may align with the current zeitgeist of interest from people searching Google, but diverge completely with the aim of literal, precise keyword search needed by academics. I did work to improve or hone the search down, looking up Kublai Khan, doctors, atheism brings up blogspam articles on doctors and atheism, but scant results on the 13th century Mongol emperor's religious medical interest. Trying to narrow the search further by including variations on Cambulac, Cambaliech, trying to find any info beyond surface-level on John of Montecorvino and his retinue... all is impossible with search engines in 2018.

5 comments

Don't forget(!) Google's horrible forgetfulness, and its way of banning you if you try hard enough to extract anything useful from it:

https://news.ycombinator.com/item?id=16153840

For me, the niche is repair information and in particular, identifying IC part numbers and finding datasheets. Searching "service manual" now invariably brings up useless user's manuals, and searching too many times for IC part numbers gets you CAPTCHA-banned.

(Somewhat understandbly, part numbers tend to look like semirandom bot-queries, but it's still a horrible experience to be called a bot just because you're actually after more information than the average user.)

Keyword-based would be a great step forward(!), but something like "grep for the Web" would be ideal. I remember many decades ago learning how to use boolean operators and such, since nearly all search engines of the time provided such functionality. Now the mainstream ones which have a big enough index to be effective also have removed much of that functionality and try very hard to limit you from using it. For another example, try using "site:" searches multiple times with Google --- another way to get rapidly banned.

When you find domains that contain useful information, crawl and index them manually.
Indeed, the best solution.

Interesting enough, I find separate web crawling as a service and search engine as a service, but not both?

You just described the Bing/Yahoo BOSS APi
Allright, I forgot that ones.

However they are quite pricey. Maybe some solution that one can host himself is a nicer alternative.

> something like "grep for the Web" would be ideal

A couple of these (e.g., Blekko) popped up 5-10 years ago. I don't think any made it far.

Some of them got bought like Blekko.
I find that more and more often a search like "keyword1 keyword2 keyword3" will give results that only match 2/3 keywords in the first N results. I feel that I'm frequently having to think "How can I phrase this search to get Google to do what I want?", which seems like a problem they solved (mostly) fairly early on.

It's especially annoying when you search "keyword1 keyword2" then "keyword1 keyword2 keyword3" and get the same results, just with a "Missing terms: keyword3" note below each (and more often than not, an alternative search will find what I'm looking for, so it's not just a case of there being nothing to match all three).

Edit: missed "note".

I also noticed recently that if you search for problems using google cloud stuff (app engine in my case) on google, the full first page of results are the documentation for the product. What I wanted was stack overflow posts, or angry forum posts where other users had the same questions. Or somebody’s personal blog or GitHub gist where they talk about what to do. If I want all results from the documentation I can go to the documentation and search from there! If I used google to search for information on C# programming they wouldn’t return a page of 100% MSDN results, so I don’t see why they do for app engine.

Not strictly related to your comment, but similarly frustrated.

Interesting; I usually prefer to see documentation search results instead of "me too" Q&A posts where nobody's solved the problem in 8 years. Maybe a good mix of first-party and third-party sources would be ideal for the first results page; definitely not an entire page of just original documentation.
This one particular thing is easy enough to work around, if the results you don't want are all from the same domain/path:

    -site:cloud.google.com/storage/docs/
Try using G search "Tools/All results/Verbatim" option for your 'marco polo doctors' query. Maybe throw in a "-who" as well. These tricks help a little. Of course, one may also ask whether the page that you envision exists on the searchable web. It should, but maybe not.
Kind of off topic, but did you ever find out Kublai Khan's religious medicinal interests? I'd be curious to know.
Not yet! I'm still on the hunt, but now I am searching for books. I suspect it will require checking out a few tomes from library to find what I'm looking for.
It might be easier to email a professor who's an expert in Khan to point you in the right direction. If you're lucky, they might know exactly the information you want.