Hacker News new | ask | show | jobs
by mluggy 2582 days ago
is there a reason why you can't have a deployment/set of pods per client? the article keeps mentioning every solution failed when the whole dataset hit a certain limit.
2 comments

Obvously you can parallelize this problem perfectly per customer, unless you are data mining them, which would remove the congestion.

A TSDB is in the end a db with a timestamp in each row and some convience functions out of the box.

Naturally it depends on the business use case or product situation, but a lot of XYZ per client architectures fail because some things don't "scale down" enough while others don't "scale up" enough.

Warning: Broad generalizations ahead.

Most successful shard strategies work because each division is hopefully roughly uniform. It's kinda like with binary tress, they work best when balanced. Clients are often more of a long tail, skewed, distribution. You often have tons and tons of small clients where the per-client overhead could be painful, while at the same time your biggest clients might outgrow what you can support in a shard.

To strawman your pods/client, dealing with 1k vs 1mm individual deployments is way different than dealing with a clientId column where the unique elements go from 1k to 1mm. Good indexing might be cheaper. But if you different regulation domains (HIPPA, GDPR, China, etc.) it can be easier to just run whole different data centers.

These balancing acts are what make data infra problems fun to work on.