Hacker News new | ask | show | jobs
by hn_acc1 482 days ago
The mention of "could barely ssh in" reminds me of a situation in grad school where our group had a Sun UltraSparc 170 (IIRC) with 1GB HD and 128 or 256 MB of RAM, shared by maybe 8 people in a small research group relating to parallel and distributed processing. Keep in mind, Sun machines were rarely rebooted, ever.

So I guess the new user / student was trying to do things in parallel to speed things up when they chopped up their large text file into N (32 or 64) sections based on line number (not separate files), and then ran N copies of perl in parallel, each processing its own set of lines from that one file.

Not only did you have a large amount (for back then) of RAM used by N copies of the perl interpreter (separate processes, not threads, mind you!) processing its data, as well, any attempt to swap was interleaved with frantic seeking to a different section of the same file to read a few more lines for one of N processes stalled on IO. Also, probably the Jth process had to read J/N of the entire file to get to its section. So the first section of the file was read N times, the next N-1, then N-2, etc.

We (me and the fellow-student-trusted-as-sysadmin who had the root password) couldn't even get a login prompt on the console. Luckily, I had a logged-in session (ssh from an actual x-terminal - a large-screen "dumb" terminal), and "su" prompted for a password after 20-30 minutes of running it. After another 5-10 minutes, we had a root session and were able to run top and figure out what was going on. Killing the offending processes (after trying to contact the user) restored the system back to normal.

Edit: forgot to say: had the right idea, but totally didn't understand the system's limitations. It was SEVERELY I/O limited with that hard drive and relatively low RAM, so just processing the data linearly would have easily been the best approach unless the amount of data to be kept would have gotten too large.