| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fulmicoton 1871 days ago

A normal search experience (displaying a 20 hits search page) requires num segments * (1 + num terms * 2) + 20 GET requests.

We have 180 segments for our commoncrawl index. So we can consider a generous upper bound of 1000 requests.

The GET request costs adds $0.0004 per commoncrawl search request. Storage costs us $5 per day, so the cost of GET request starts topping storage cost at >10k request per day.

Our search engine is meant for searching large datasets, with a low number of queries: Logs, SIEM, e-discovery, exotic big data datasets, etc. These use case have typically a low daily query rate.

For high request rate, (1 query per second) like e-commerce, entirely decoupling storage and compute is actually a bad idea. For low request rate (< 1000 per day), using S3 without caring about the GET request cost is perfectly fine. And in the middle, you might probably want to use another object model with a more favorable pricing model.