| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by meteorfox 3743 days ago

It's really impressive that it can handle these many container placements.

But, honest question, what's the value of determining how fast can we schedule a million containers? This question is not just for Nomad but other cluster managers as well that have recently published similar benchmarks.

I see the value of scheduling thousands to perhaps hundreds of thousands of containers across many nodes, but millions seem excessive.

I think that is more valuable to measure what happens after you have 1 million containers running on your cluster. Such as: - What is the overhead keeping track of that many containers? - How do they impact the responsiveness of other API calls (list, delete)? - What happens when nodes go down and suddenly you lose a considerable amount of containers, can it recover quickly? - How does it impact the performance of running containers in the cluster?

Also, there are other important factors to test for: - what about image size? How does it impact scheduling time when non-cached? - container density per node - number of nodes - what about scheduling other workloads that Nomad support, like VMs and runtimes?

1 comments

illumin8 3743 days ago

With any system of sufficient scale, you're bound to hit artificial (software design inflicted) limits to maximum scale or performance.

The reason why a good software company tests extreme limits (1 million containers) that most customers will never see is to ensure customers that they will not reach a scale limitation.

From my experience running large private cloud infrastructure (>14,000 virtual servers at once), you will always hit some crazy limit that the vendor never anticipated. "14,000 VMs? We've only tested with 10,000" (not a real example, but an idea of what type of problem you'll run into)

Proving 1 million containers in 5 minutes is just designed to assure regular customers that they're fine. I doubt anyone really needs that many containers for any current workload...