Hacker News new | ask | show | jobs
by thayne 540 days ago
That's a lot of requests.
1 comments

It is, but it's not _that_ many. AWS pricing is complicated, but for fairly standard services and assuming bulk discounts at ~100TB level, your break-even points for requests/network vs storage happens at:

1. (modifications) 4200 requests per GB stored per month

2. (bandwidth) Updating each byte more than once every 70 days

You'll hit the break-even sooner, typically, since you incur both bandwidth and request charges.

That might sound like a lot, but updating some byte in each 250KB chunk of your data once a month isn't that hard to imagine. Say each user has 1KB of data, 1% are active each month, and you record login data. You'll have 2.5x the break-even request count and pay 2.5x more for requests than storage, and that's only considering the mutations, not the accesses.

You can reduce request costs (not bandwidth though) if you can batch them, but that's not even slightly tenable till a certain scale because of latency, and even when it is you might find that user satisfaction and retention are more expensive than the extra requests you're trying to avoid. Batching is a tool to reduce costs for offline workloads.

Ok, there are definitely cases where it would be more expensive, like using it for user login data.

But for metrics, like you would use for prometheus:

- Data is usually write-only. There isn't usually any reason to modify the metrics after you have recorded them.

- The bulk of your data isn't going to be used very often. It will probably be processed by monitors/alerts and maybe the most recent data will be shown in dashboard (and that data could be cached on disk or in memory). But most if it is just going to sit there until you need to look at it for an ad-hoc query, and you should probably have an index to reduce how much data you need to read for those.

- This metrics data is very amenable to batching. You do probably want to make recent data available from memory or disk for alerts, dashboards, queries, etc. But for longer term storage it is very reasonable to use chunks of at least several megabytes. If your metrics volume is low enough that you have to use tiny objects, then you probably aren't storing enough to be worried about the cost anyway.