Hacker News new | ask | show | jobs
by jmillikin 1181 days ago
At my previous employer, there was at least one person with a "staff software engineer" job title who believed that running more than one Ruby server process per AWS virtual machine would lead to unacceptable contention at the hardware level. I was never able to convince them that Linux handles tens of thousands of processes just fine, or that even if you do one per VM there's nothing stopping AWS from scheduling those VMs onto the same machine.

I guess whether you consider someone like that to be a "serious person" or a "charlatan" depends on your own point of reference.

In that case, their arguments were more persuasive to management than mine were. I found the experience baffling.

2 comments

The lesson for you was in communication not engineering.

No hate. Was (is) a frustrating experience for me too.

Having had some back-channel status updates after I got pushed aside, the lesson is that some people accept or discard ideas based on employment background.

On the team I was on, newly-hired managers from Amazon accepted the ideas of ICs who had previously worked at Amazon, and rejected ideas from people who had not. I didn't realize this was a pattern until I already had one foot out the door, but even if I had realized it earlier it wouldn't have helped. I was ex-Google, so the ex-Amazon folks tagged every document with the bozo bit before they'd even opened it.

My takeaway was to be cautious of companies where the culture is imported through mass hiring from single companies. I joined in the middle of the "Google wave", which was relatively peaceful (per Google's culture at the time). When the "Amazon wave" arrived it was quite a shock; their culture was much more adversarial and authoritarian than anywhere I'd worked before. By the time I left, there were signs the Amazon folks were starting to get sidelined by an emerging "Oracle wave".

Doesn’t it depend on how many cpu cores to some extent and how much the ruby process is idle on io?

Amazon doesn’t over commit cpu for normal VM instance types.

I'm not referring to overcommit or actual "cpus pegged at 100%" contention, but to simple loose bin-packing.

Imagine you have three Ruby services, where each is allocated 10 cores of CPU time (via pinning with cpuset). If you give them each an 16-core VM, then there'll be 18 cores of "wasted" CPU. If you instead bin-pack them onto a 32-core VM, then they'll have the same number of cores at a lower price point.

If each service runs at 50% capacity with 2000ms latency during steady state, how much extra latency would you expect the service to have on the bin-packed configuration vs the single-VM?

My position is "very little extra latency", the other person's position was "a lot of extra latency due to hardware contention in (for example) the memory controller".

(If you're reading this and thinking "NUMA node locality", then you're operating two or three levels above where this org was in terms of optimization.)

> If you're reading this and thinking "NUMA node locality", then you're operating two or three levels above where this org was in terms of optimization.

Talking about Ruby services and not hand optimized C kinds of give it away. And even with hand optimized C you would do a cost/benefit analysis of less optimal packing.

Yeah but running them on the same machine is a pain monitoring wise, unless you're really trying to save dollars.