Hacker News new | ask | show | jobs
by netsectoday 2225 days ago
You can boot up your own custom search engine in a few minutes with YaCy (Ya See!) an open-source, P2P, Dockerized crawler and search engine built on top of Solr.

https://yacy.net/

If you're generous; you can make your index available to other P2P instances.

I wanted to run an API search the other week and was blown away with how quickly I could prop-up my own custom search portal (I didn't want to pay for API access to other search engines, and YaCy comes with a JSON and Solr endpoints).

I ran it locally to test my crawl filters, then pushed a private instance out to Digital Ocean to turn up the heat with the crawling. The only issue I had was the crawler would hit the max memory threshold on long crawls and the container would restart, but that was fixed by scaling up the box.

1 comments

I have my own yacy search engines running internally (non-peered) for similar reasons. One crawls some key code documentation sites that I need for work, and another crawls a whole bunch of music blogs.

While I typically still use RSS for reading music blogs, I find having the search engine is a great way to go back and find something or discover something new! Every time I find a new blog, I just add it as an index to yacy to crawl.

I think it'd be great to see people spinning up larger instances that are highly specialized. For example, maybe a search engine that is dedicated solely to sci-fi and only crawls high quality boards, personal sites and blogs, and skips all the spammy, seo-optimized sites.