| Thanks! I really appreciate the detailed feedback and suggestions. The idea of supplementing the index with older Common Crawl/Blekko-era data is definitely interesting, especially for preserving pages that are gone now. The metasearch + independent reranking concept is interesting too, but one of the main goals with Slick is staying completely independent long term. I know that comes with much slower growth and a lot more work, but I think it's better than building on top of another search engine for 5 years and then suddenly having that engine massively change direction. I actually only recently learned that Google is planning to heavily rework Search around AI as well, which honestly reinforced my decision to keep Slick independent instead of relying heavily on another engine (https://san.com/cc/googles-shift-to-ai-powered-search-result...). Right now I'm mainly focused on improving Slick's own crawl/index quality instead of relying too heavily on external sources. I've taken a look at 4get.ca, which is Canadian apparently (I am too), it's really good. Although again, I'm not leaning too heavily into metasearch unless maintaining a fully independent index becomes unrealistic. I have already written over 15 thousand lines of code for this engine already, over a year of coding. I've never noticed the "search engines disappearing", probably because they're disappearing. I should probably read up on that. Most likely it's because they can't afford to run the project anymore, whether it's mentally or financially. I've experienced this too. I'm actively trying to promote to get new supporters of the search engine, to no avail. I don't think I'll feel disinclined to work on the project any time soon, but if I ever do, I'll be sure to tell you. You are my first supporter after all. I'm currently not looking for employees right now, but I appreciate the offer. I've been able to do this much on my own, and it's just uphill from here. Improving the ranking bugs I mentioned in my blog, getting more supporters so I have an incentive to get infrastructure, improving my crawler, etc. I really appreciate the support. |
For promotion, I’d recommend picking the most technically interesting part of your implementation, something that’s really clever, and then making a one page writeup for Paged Out magazine about it (https://pagedout.institute/). They regularly have interesting stuff to read, and they have a pretty decent amount of readers. You could write something longer and send it in to 2600 magazine too, they’d probably be interested even if it was an overview of the project.
Maybe the engine should be bigger first though so people are more enthused when they try it. I think 1 billion pages is around where a search engine starts to seem more normal: that’s about how much Marginalia has. How much space on disk does your index take up right now? Would you say the bottleneck is more the hard drive space or the crawling speed?