|
> then a team of engineers builds a crazily complex pipeline to make the python perform in some reasonable time frame. (Hi, Spark!). Having seen at least a couple of similar setups, I remain skeptical that this isn't, at its core, just a problem of ignorance of how "big" one can make/get a single server, before even paying a premium. However, even for the "largest" commodity servers, last I looked, the premium at the highest end (over linear price:performance) was only something like 4x. There was some relevant discussion of single server versus distributed in subthreads of https://news.ycombinator.com/item?id=17492234 a few days ago. > In my workplace, one example allocates bits of a job to roughly 100 machines, moving data to each, in a cloud environment where the data movement overhead is constantly fighting the benefits of distribution. I'm confident that cloud environments contribute to hardware ignorance, since cloud providers offer a very limited choice of options, and I have yet to see anything high end. This is especially a frustration for me with networking options, where high bandwidth (beyond 10Gb/s on AWS, until recently, and still only 40GB/s max, AFAIK) is nonexistent and, otherwise, expensive, and low latency options like Infiniband don't seem to exist, either, even at the now low/obsolete bandwidths of 16 or 32Gb/s. |