Hacker News new | ask | show | jobs
by mmt 2896 days ago
> then a team of engineers builds a crazily complex pipeline to make the python perform in some reasonable time frame. (Hi, Spark!).

Having seen at least a couple of similar setups, I remain skeptical that this isn't, at its core, just a problem of ignorance of how "big" one can make/get a single server, before even paying a premium.

However, even for the "largest" commodity servers, last I looked, the premium at the highest end (over linear price:performance) was only something like 4x.

There was some relevant discussion of single server versus distributed in subthreads of https://news.ycombinator.com/item?id=17492234 a few days ago.

> In my workplace, one example allocates bits of a job to roughly 100 machines, moving data to each, in a cloud environment where the data movement overhead is constantly fighting the benefits of distribution.

I'm confident that cloud environments contribute to hardware ignorance, since cloud providers offer a very limited choice of options, and I have yet to see anything high end.

This is especially a frustration for me with networking options, where high bandwidth (beyond 10Gb/s on AWS, until recently, and still only 40GB/s max, AFAIK) is nonexistent and, otherwise, expensive, and low latency options like Infiniband don't seem to exist, either, even at the now low/obsolete bandwidths of 16 or 32Gb/s.