|
|
|
|
|
by pcwalton
3741 days ago
|
|
I believe you that you do. That doesn't change the fact that few people actually do it or need to do it. There are many more people who think they need to recover from OOM but actually don't and would be better off if they didn't try. (There have been many security vulnerabilities that have resulted from attempting to handle OOM gracefully that wouldn't have been an issue if malloc just aborted the process.) |
|
I am one of those few people. We are running some scientific code (currently, unfortunately, written in C++) on a heterogeneous bunch of compute nodes. Some computations can be extremely memory-intensive, and sometimes in ways that we didn't predict. So it's useful to be able to fail gracefully and record that computation X on node Y with input parameters [Z] failed specifically due to running out of memory at step W - so that e.g. the queue manager can try relaunching the computation on a beefier node or adjusting how many instances of which computation are allowed on Y.