| In 1982 you still had "supercomputers" like https://en.wikipedia.org/wiki/Cray_X-MP because you could still make bipolar electronics that beat out mass-produced consumer electronics. By the mid 1990s even IBM abandoned bipolar mainframes and had to introduce parallelism so a cluster of (still slower) CMOS mainframes could replace a bipolar mainframe. This great book was written by someone who worked on this project https://campi.cab.cnea.gov.ar/tocs/17291.pdf and of course for large scale scientific computing it was clear that "clusters of rather ordinary nodes" like the https://www.cscamm.umd.edu/facilities/computing/sp2/index.ht... we had at Cornell were going to win (ours was way bigger) because they were scalable. (e.g. the way Cray himself saw it, a conventional supercomputer had to live within a small enough space that the cycle time was not unduly limited by the speed of light so that kind of supercomputer had to become physically smaller, not larger, to get faster) Now for very specialized tasks like codebreaking, ASICs are a good answer and you'd probably stuff a large number of them into expansion cards into rather ordinary computers and clusters today possibly also have some ASICs for glue and communications such as https://blogs.nvidia.com/blog/whats-a-dpu-data-processing-un... ---- The problem I see with people who attempt parallelism for the first time is that the task size has to be smaller than the overhead to transfer tasks between cores or nodes. That is, if you are processing most CSV files you can't round-robin assign rows to threads but 10,000 row chunks are probably fine. You usually get good results over a large range of chunk size but chunking is essential if you want most parallel jobs to really get a speedup. I find it frustrating as hell to see so many blog posts pushing the idea that some programming scheme like Actors is going to solve your problems and meeting people that treat chunking as a mere optimization you'll apply after the fact. My inclination is you can get the project done faster (human time) if you build in chunking right away but I've learned you just have to let people learn that lesson for themselves. |
Originally, the whole point of the Hadoop architecture was that the data were pre-chunked and already sitting on the local storage of your compute nodes, so that the overhead to transfer at least that first map task was effectively zero, and your big data transfer cost was collecting all the (hopefully much smaller than your input data) results of that into one place in the reduce step.
Now we're in the cloud and the original data's all sitting in object storage. So shoving all your raw data through a tiny small slow network interface is an essential first step of any job, and it's not nearly so easy to get speedups that were as impressive as what people were doing 15 years ago.
That said I wouldn't want to go back. HDFS clusters were such a PITA to work with and I'm not the one paying the monthly AWS bill.