|
|
|
|
|
by Xamayon
1395 days ago
|
|
Getting and storing the content is one of the main challenges, and it's getting harder by the day with more and more sites using anti bot stuff from companies like Cloudflare.
With the SauceNAO.com image search engine I tried to tailor it to my own needs, taking a slow and steady semi-curated approach. To keep things sane and costs low I went after specific sites (and other resources) which have high signal to noise, and highly desirable content. I add a couple at a time, finding and fixing bottlenecks as they come up. Nothing is perfect from the start, so I mainly focus on environment simplicity and getting the minimum viable setup working as quickly as possible. This has caused some problems to be sure, and led to the site looking and feeling less than awesome in many ways, but at least it (mostly) works... Over time I have had to rewrite everything - the crawling software, search algorithms, back-end database, and front-end when it became apparent things could be done more efficiently to deal with the ever increasing usage and scale. Having the content stored to enable re-generating indexes quickly has been very important long term! It has taken many years (started in 2008), but in its art/entertainment niche, it has really started to take off usage wise. My advice would be to start semi-small, throwing things at the wall and see if anything works. Try to keep the initial setup as simple and affordable as possible unless you have serious funding available. Building even a small search engine can take a lot of resources and time, but it can also be an amazingly fun hobby. |
|