Hacker News new | ask | show | jobs
by sheraz 3903 days ago
I don't know that I would go head-to-head with Google in crawling the entire web. However, I do see a lot of opportunities for "vertical search." That is -- search engines focused on specific, niche verticals (travel, healthcare, etc)

I'm working on a couple of projects in vertical search, and it is quite exciting. Sure, I'm building tech that Google had in 2005, but we are surprised with the results. We achieve search relevance simply by curating the sites we crawl (still in the thousands in some cases).

1 comments

Do you have any links to share? I'm working on a side project for vertical search for programmers. Curating sites to crawl with source code, docs, mailing lists, QA, IRC and tutorials.

Trying to get away from the "W3Schools effect" [0], where outdated, terribly presented information or downright spammy pages are locked in the top results of Google by virtue of being around for so long, or by gaming search keywords [1].

[0] https://github.com/nathancahill/fuck-w3schools

[1] http://www.bigresource.com/

I don't have anything public, but I have been exploring strategies for gluing together different tech in order to accomplish our goals. Latest stack has been:

- wget / wpull / heretrix to produce .warcs across a single domain - have a filewatcher on a folder to process .warc into text and then push it into elasticsearch with relevant metadata - flask search frontend for querying / results

Happy to share my learnings elsewhere. (I pinged you on email)