|
|
|
|
|
by mdaniel
1401 days ago
|
|
That sounds very cool, and I hope you (and your customers!) are successful. Out of curiosity, did you find an existing market need for that, or it's a "build it and they will come" model? Also, have you thought about partnering with commoncrawl.org? I could see that relationship benefiting both sides: they get fresher indices, you get access to the historical web snaps |
|
The problem is that if you were to build a new search engine from the ground up it will take millions in infrastructure, and a lot of time for you to test one idea. And there are multiple attack vectors to Google's business model (privacy, subscription model, modality, etc.) however you might get the change of testing one of them, and if that fails, starting again is super expensive so you might not be able to get funds to do it.
My approach then became to build something that others can build on top of.
I'm currently using common crawl but my main problem is that I need to build a small toy to test it and even processing common crawl is crazy expensive. Just a single snap are 150 Tb, so this needs to be process on metal, or you're gonna pay a hefty AWS bill.