|
|
|
|
|
by meteorfox
3743 days ago
|
|
It's really impressive that it can handle these many container placements. But, honest question, what's the value of determining how fast can we schedule a million containers? This question is not just for Nomad but other cluster managers as well that have recently published similar benchmarks. I see the value of scheduling thousands to perhaps hundreds of thousands of containers across many nodes, but millions seem excessive. I think that is more valuable to measure what happens after you have 1 million containers running on your cluster. Such as:
- What is the overhead keeping track of that many containers?
- How do they impact the responsiveness of other API calls (list, delete)?
- What happens when nodes go down and suddenly you lose a considerable amount of containers, can it recover quickly?
- How does it impact the performance of running containers in the cluster? Also, there are other important factors to test for:
- what about image size? How does it impact scheduling time when non-cached?
- container density per node
- number of nodes
- what about scheduling other workloads that Nomad support, like VMs and runtimes? |
|
The reason why a good software company tests extreme limits (1 million containers) that most customers will never see is to ensure customers that they will not reach a scale limitation.
From my experience running large private cloud infrastructure (>14,000 virtual servers at once), you will always hit some crazy limit that the vendor never anticipated. "14,000 VMs? We've only tested with 10,000" (not a real example, but an idea of what type of problem you'll run into)
Proving 1 million containers in 5 minutes is just designed to assure regular customers that they're fine. I doubt anyone really needs that many containers for any current workload...