|
|
|
|
|
by bunderbunder
668 days ago
|
|
Sure, but, as I was (rather unpopularly) pointing out in another comment, that point was pretty hard to reach in 1982. Specifically the point where you've met both criteria: bigger computer is too cost prohibitive to get, and lots of smaller computers is easier. At the time of this lecture, parallel computers had a nasty tendency to achieve poorer real-world performance on practical applications than their sequential contemporaries, despite greater theoretical performance. It's still kind of hard even now. To date in my career I've had more successes with improving existing systems' throughput by removing parallelism than I have by adding it. Amdahl's Law plus the memory hierarchy is one heck of a one-two punch. |
|
https://en.wikipedia.org/wiki/Cray_X-MP
because you could still make bipolar electronics that beat out mass-produced consumer electronics. By the mid 1990s even IBM abandoned bipolar mainframes and had to introduce parallelism so a cluster of (still slower) CMOS mainframes could replace a bipolar mainframe. This great book was written by someone who worked on this project
https://campi.cab.cnea.gov.ar/tocs/17291.pdf
and of course for large scale scientific computing it was clear that "clusters of rather ordinary nodes" like the
https://www.cscamm.umd.edu/facilities/computing/sp2/index.ht...
we had at Cornell were going to win (ours was way bigger) because they were scalable. (e.g. the way Cray himself saw it, a conventional supercomputer had to live within a small enough space that the cycle time was not unduly limited by the speed of light so that kind of supercomputer had to become physically smaller, not larger, to get faster)
Now for very specialized tasks like codebreaking, ASICs are a good answer and you'd probably stuff a large number of them into expansion cards into rather ordinary computers and clusters today possibly also have some ASICs for glue and communications such as
https://blogs.nvidia.com/blog/whats-a-dpu-data-processing-un...
----
The problem I see with people who attempt parallelism for the first time is that the task size has to be smaller than the overhead to transfer tasks between cores or nodes. That is, if you are processing most CSV files you can't round-robin assign rows to threads but 10,000 row chunks are probably fine. You usually get good results over a large range of chunk size but chunking is essential if you want most parallel jobs to really get a speedup. I find it frustrating as hell to see so many blog posts pushing the idea that some programming scheme like Actors is going to solve your problems and meeting people that treat chunking as a mere optimization you'll apply after the fact. My inclination is you can get the project done faster (human time) if you build in chunking right away but I've learned you just have to let people learn that lesson for themselves.