Hacker News new | ask | show | jobs
Surprising Scalability of Multitenancy (brooker.co.za)
75 points by federicoponzi 1182 days ago
5 comments

I'd seen a more useful paper on this subject, on how to organize your game servers for a big MMO. The most economical strategy was to own your servers for the base load, and go out for AWS for peaks. Running 24/7 compute bound work on AWS is at least 2x as expensive as owning your own co-located servers.
You can buy reserved instances that are about half the price as on-demand, so it really depends on how long the peaks are.
Dedicated servers are 1/4 or less of price of on-demand (don't forget the bandwidth!!!).
And all you need to do is pay a sysadmin or two in pizza to operate them :-)

Dedicated servers are undoubtedly cheaper in some circumstances than even a well-managed AWS account. But you do need to account for redundancy (including staffing), scaling up, possibly geographic replication, etc.). Setting up a dedicated server is just the beginning and be sure to take into account all costs--as is the case on a cloud provider too of course.

You absolutely need experts to run your cloud aswell, and there are colocating services that offer staff on hand for hardware issues, you don't actually have to run an entire datacenter even if you run baremetal
I guess the latency between AWS and your data centre would have a negative impact on game performance.
I believe the idea is to spin new servers on AWS and and connect players directly to them instead of hopping via their own infra.

That’s way your profit margins on the AWS servers is lower than self hosted ones but at least you’re making money.

The latency is still a factor of the AWS players interact with the non AWS ones.
I don't know how MMOs do this at all, but I would assume that for an MMO to be scalable, you do some sort of population-based geometric slicing of the world, and then assign each slice to a server such that players communicate with the server for their slice and the server for adjacent slices that are in some sort of visible/soon-to-be-visible range. That would mean no interaction between the servers - just between clients and servers. It also means that servers can be smoothly scaled out by cutting one server's area into two servers.

Edit - And if a group of players raid a dungeon, the population of that dungeon is strictly limited, so you can park that raid on one server and don't worry at all about inter-player latency.

The latency critical stuff traditionally happens in dungeons or other instances, precisely to get those players on a shared physical server. You just have a fleet of servers that each can handle X instances, and have a queue in between. And conveniently player state can just be synced in the loading time before and after the dungeon.

The bigger world is handled by slicing it up, but you still have a lot of communication going on with central databases for stuff like inventory management, chat, quests, etc. so you would probably try to keep all that within your own server racks.

Depends on the details, but pick an AWS location near your DC. And/or pick a DC location near AWS.
Mind linking the said paper?
The author sounds a bit scared. Maybe the recent wave of "we can save $$$ by leaving AWS" articles have them rattled?

Yes, multi-tenancy and improved hw utilization can save money ... for Amazon. That's of no use if they lack sufficient competition and just capture the savings as profits. Then you're just wasting time on debugging weird contention issues and cloud cost optimization consultants so Bezos can get richer.

The profit margins on AWS are so huge that even though you they can binpack better it often doesn't matter, you're going to still save money by going to either a cheaper cloud or using your own HW (or renting your own dedicated HW). The savings from multi-tenancy are drowned by the added costs.

One intriguing model that might be worth exploring is micro-clouds. In that model there's a kind of clearing market, and users with strong diurnal cycles and not many batch jobs can re-sell their CPU capacity at night to other users. They just implement some Lambda-ish API and configure the kernels/hypervisors to always prioritize their own jobs over guests. The guests don't care because they're getting the resources cheap, for the company the additional income offsets the cost of their own machines and the market takes a cut. The difference vs today's cloud models is it's more decentralized and the "cloud provider" is really just a match maker, so it's easy to set up competitors and margins would be low.

that'd be cool but quite improbable until exploits like RowHammer, Meltdown and Spectre can be reliably ruled out.
Even if those were sorted, you probably want to hold out for homomorphic encryption. The threat model of Amazon having all your data is much different from the threat model of anyone willing to bid cheaply enough on a lambda execution having it. OTOH in the latter case, we can probably expect three letter agencies all over the world to be generously subsidizing our compute (for example, by reselling GovCloud at a loss).
Those problems affect cloud providers too.

BTW modern CPUs support the creation of RAM-encrypted VMs with remote attestation, so you can lower the trust needed in the targets by a lot. That said there are lots of companies that are known quantities, have verifiable brands and may even be considered more trustworthy than the big clouds in some cases because they're local firms.

It’s ironic that AWS touts the benefit Lambda gets from overcommit, but if you build a lambda that simply turns around and makes an api call, you are paying full price for the cpu usage, even though it’s idle.
It doesn't matter if it's more efficient for Amazon (which serverless very much is) if they don't pass on the savings to you. Lambda is priced as a "value add" not as an efficiency improvement.
They should discount based on average cpu used.
You’re still consuming the RAM for the duration. In our on-prem VMWare environment, we didn’t charge, but we thought of the limited allocation being RAM far ahead of CPU ahead of disk.
The whole point of the bin packing in the blog post was to increase cpu utilization, so Amazon is clearly saving money if you are blocked on io.
Who is this surprising to? Timesharing, timeslicing, multiprocess, multitenancy,-- whatever you call the same underlying concept -- was one of the pivotal advances in computer systems. Surely no serious person is surprised it is effective.
At my previous employer, there was at least one person with a "staff software engineer" job title who believed that running more than one Ruby server process per AWS virtual machine would lead to unacceptable contention at the hardware level. I was never able to convince them that Linux handles tens of thousands of processes just fine, or that even if you do one per VM there's nothing stopping AWS from scheduling those VMs onto the same machine.

I guess whether you consider someone like that to be a "serious person" or a "charlatan" depends on your own point of reference.

In that case, their arguments were more persuasive to management than mine were. I found the experience baffling.

The lesson for you was in communication not engineering.

No hate. Was (is) a frustrating experience for me too.

Having had some back-channel status updates after I got pushed aside, the lesson is that some people accept or discard ideas based on employment background.

On the team I was on, newly-hired managers from Amazon accepted the ideas of ICs who had previously worked at Amazon, and rejected ideas from people who had not. I didn't realize this was a pattern until I already had one foot out the door, but even if I had realized it earlier it wouldn't have helped. I was ex-Google, so the ex-Amazon folks tagged every document with the bozo bit before they'd even opened it.

My takeaway was to be cautious of companies where the culture is imported through mass hiring from single companies. I joined in the middle of the "Google wave", which was relatively peaceful (per Google's culture at the time). When the "Amazon wave" arrived it was quite a shock; their culture was much more adversarial and authoritarian than anywhere I'd worked before. By the time I left, there were signs the Amazon folks were starting to get sidelined by an emerging "Oracle wave".

Doesn’t it depend on how many cpu cores to some extent and how much the ruby process is idle on io?

Amazon doesn’t over commit cpu for normal VM instance types.

I'm not referring to overcommit or actual "cpus pegged at 100%" contention, but to simple loose bin-packing.

Imagine you have three Ruby services, where each is allocated 10 cores of CPU time (via pinning with cpuset). If you give them each an 16-core VM, then there'll be 18 cores of "wasted" CPU. If you instead bin-pack them onto a 32-core VM, then they'll have the same number of cores at a lower price point.

If each service runs at 50% capacity with 2000ms latency during steady state, how much extra latency would you expect the service to have on the bin-packed configuration vs the single-VM?

My position is "very little extra latency", the other person's position was "a lot of extra latency due to hardware contention in (for example) the memory controller".

(If you're reading this and thinking "NUMA node locality", then you're operating two or three levels above where this org was in terms of optimization.)

> If you're reading this and thinking "NUMA node locality", then you're operating two or three levels above where this org was in terms of optimization.

Talking about Ruby services and not hand optimized C kinds of give it away. And even with hand optimized C you would do a cost/benefit analysis of less optimal packing.

Yeah but running them on the same machine is a pain monitoring wise, unless you're really trying to save dollars.
Time slicing significantly predates computers, and was quite well developed even when Erlang was analyzing it a century ago. What's surprising here isn't that time slicing works, it's that the same mechanism drives both the economics of large systems, and their ability to economically support bursty workloads.

I can understand that may not be a surprise to you. What's surprising to me is that you took the time to come say you aren't surprised, instead of going on with your day.

Clearly, I shouldn't have claimed this casual blog post was original research that had never been seen in any form before. Silly me!

> Time slicing significantly predates computers, and was quite well developed even when Erlang was analyzing it a century ago.

I mean its implementation in computer systems.

> What's surprising here isn't that time slicing works, it's that the same mechanism drives both the economics of large systems, and their ability to economically support bursty workloads.

That's not surprising.

> Clearly, I shouldn't have claimed this casual blog post was original research that had never been seen in any form before. Silly me!

That's not the issue though is it, that's your snarky strawman to deflect from it. Which is that its a lazy cliche title and it purports to be much more grandiose than it is.

thank you, good samaritan, for doing the tough but necessary work of disparaging this blog post, and the person who posted it, because you find the conclusions obvious

i'm sure that marc brooker, the author, and one of the most accomplished computer scientists currently living, will think twice before posting such pablum again

Somebody was surprised by the scalability of multitenancy :)
I agree. It's not even about efficiently utilising resources. If we process things in timeslices that immediately kills any bad effects variability has on the workload. Small jobs go fast even if a big job happened to arrive in front of it.

(Of course, it's still a tradeoff between context switching overhead and the magnitude of the variability, but fundamentally even a tiny level of timeslicing can be a huge improvement.)

One thing this scalability bets on is that side channel attacks won’t get better.

Spectre and related attacks already reduced CPU performance.

Shared hardware opens up the door for side channel attacks and hardening against those attacks is going to decrease performance.

You'd generally use co-tenancy for workloads that are mutually trusted. Privileged services (authn/authz, machine management, deployable artifact builds) get put onto separate hardware, since their footprint is small enough that the extra 200% cost isn't material.
This isn't how things always run in the cloud. I think the conventional wisdom is that the isolation of VMs is good enough unless you are very paranoid. Auth services are regularly run on less than full baremetal machines.

AWS serverless, by the way, uses VM isolation.

Both AWS and GCP offer the ability to schedule VMs onto isolated machines:

https://aws.amazon.com/ec2/dedicated-hosts/

https://cloud.google.com/compute/docs/nodes/sole-tenant-node...

The AWS offering is pretty much turn-key. I've not used the GCP version, but it seems to be similar if you're willing to create a separate "project" for each security domain.

Once your company has any PII and/or has regulatory obligations (PCI, HIPAA, etc) then it's worth spending a bit extra to make sure sensitive components are running on their own hardware.

Usually you have to buy the whole host when you do that, and there are many ways to buy the whole machine. I personally think baremetal is a better trade - Amazon insiders have a harder time spying on you if you do that, while they can still pause your dedicated VM to take a peek at what's going on. Regardless, I have seen authentication systems and other sensitive things run on multi-tenant machines.