Hacker News new | ask | show | jobs
by chatmasta 457 days ago
This is a nice analysis of incoming and outgoing bandwidth, but totally neglects to analyze the compute power required by each query. You can’t answer that without knowing the possible queries that users can send. Do they have a dashboard with a known set of queries? Do they have a GraphQL API that can produce pathological queries? Can they send SQL directly? Are they running aggregations? Do they use filters? Are they querying recent data or all historical data?

And that’s just the read side. You also need to ask about ingestion and transformation. Is the database storing raw events and nothing else? (Is the user happy with that?) Is it append only? Is it being rolled up and summarized into daily partitions? How many transactions per second? How many of those are INSERT vs. UPDATE or DELETE? Which rows are being updated? Only recent or any of them? All of them?

etc…

There is no generic answer to this question, and “requests per second” is a reductive and insufficient interpretation of the problem that won’t identify any of the hidden complexity.

1 comments

The OP doesn’t give any hints as to what users are doing, so as I said I just assumed “last 90 days of data” was the only query they’d make, but it’s a fair point that that caused me to leave out CPU power since it’s basically negligible for just shoveling data off of a disk.

Thanks for expanding on my comments about how important actually having detailed requirements is for doing performance calculations!

Yes indeed, OP only asked “how much traffic is required,” which could be zero if his database is locked while trying to respond to some absurd query from one of his users :)