|
|
|
|
|
by notacoward
1116 days ago
|
|
Sounds about right. The processors were six-core 500MHz single issue, which was pretty damn slow even by the standards of the time (2006-2008), beefed up with some extra floating point and some relatively very fast interconnect stuff. The key is that there were a lot of processors - 972 in the big machine. So you really really had to rely on parallelism to get any kind of performance, and a lot of code doesn't parallelize well at all. Even in the HPC space, a surprising number of codes just aren't designed to work for more than about 64. Also the machine positively sucked for linear integer code - like, say, compilers or OS kernels. One of the first things customers would do, naturally, was compile. Bad first impression. Also, it was nearly impossible to get a Lustre MDS for a thousand-node cluster (which is what the biggest machine was) to run for any length of time without falling over, because Lustre was designed around the assumption that the MDS would be bigger and beefier than anything else and have "poor man's flow control" in the form of a relatively slow network. In our case it was exactly the same and completely unprotected because the interconnect was the fastest part of the system by quite a margin. That was my nightmare for those two years. I've heard that Lustre has since added some flow control ("network request scheduler") but I was never able to benefit from that. PVFS2 worked better, and Gluster (which I worked on for nearly a decade afterward) would probably have been better still because it's more fully distributed and less CPU-hungry. The reason I mention all this is that there's an important lesson: building a system with a very unusual set of performance characteristics is a terrible idea business-wise, because people won't be able to realize its potential. Not even in a fairly specialized market. They'll just think it's slow. Unless it's truly bespoke, literally a one-off or close to it, nobody will want it. P.S. I actually had to visit CU-Boulder to debug something on that machine, with the aforementioned Doug. It became one of my favorite "war stories" from a 30-year career, but this has gone on long enough so I'll skip it. |
|