| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by userbinator 3007 days ago

Don't forget(!) Google's horrible forgetfulness, and its way of banning you if you try hard enough to extract anything useful from it:

https://news.ycombinator.com/item?id=16153840

For me, the niche is repair information and in particular, identifying IC part numbers and finding datasheets. Searching "service manual" now invariably brings up useless user's manuals, and searching too many times for IC part numbers gets you CAPTCHA-banned.

(Somewhat understandbly, part numbers tend to look like semirandom bot-queries, but it's still a horrible experience to be called a bot just because you're actually after more information than the average user.)

Keyword-based would be a great step forward(!), but something like "grep for the Web" would be ideal. I remember many decades ago learning how to use boolean operators and such, since nearly all search engines of the time provided such functionality. Now the mainstream ones which have a big enough index to be effective also have removed much of that functionality and try very hard to limit you from using it. For another example, try using "site:" searches multiple times with Google --- another way to get rapidly banned.

2 comments

sitkack 3007 days ago

When you find domains that contain useful information, crawl and index them manually.

link

ccozan 3007 days ago

Indeed, the best solution.

Interesting enough, I find separate web crawling as a service and search engine as a service, but not both?

link

AznHisoka 3007 days ago

You just described the Bing/Yahoo BOSS APi

link

ccozan 3007 days ago

Allright, I forgot that ones.

However they are quite pricey. Maybe some solution that one can host himself is a nicer alternative.

link

untangle 3006 days ago

> something like "grep for the Web" would be ideal

A couple of these (e.g., Blekko) popped up 5-10 years ago. I don't think any made it far.

link

ddorian43 3005 days ago

Some of them got bought like Blekko.

link