|
|
|
|
|
by benoau
658 days ago
|
|
As a starting point I'd go all-in on serverless because even though it's not necessarily the fastest response times or cheapest, it does let you completely ignore several sets of scaling challenges till the price matters. So I'd go serverless for spidering, indexing and user-facing APIs. UIs I'd go static HTML on a CDN. For database, first I'd lean on plain file s3-like storage as much as possible for spidering and indexing data and try to keep the user-facing database under a couple hundred terabytes, I'd probably favor some bare metal at this point but that will fit on some "db as a service" providers. |
|
Spidering and indexing are processes that would most likely run continuously for any search engine like Google or Kagi. There is always data to update and new web pages being created.
Thus, they would benefit from dedicated servers from the get go for costs. Ahrefs posted an interesting article on this topic: https://tech.ahrefs.com/how-ahrefs-saved-us-400m-in-3-years-...
Then again, you could use serverless on your own dedicated servers to benefit from the advantages of both.