Hacker News new | ask | show | jobs
by timq 2510 days ago
Handling malloc() failure is almost never done for short lived programs. For instance git used to fail as soon as an error popped of (whether be it malloc() or open(), etc...). It just is much simpler and convenient to do so.

While C has no special error handling mechanism in place, error handling can still be done reasonably. IMO, the big reason for why malloc() errors are rarely handled is because it is quite hard to come with a viable fallback strategy.

1 comments

>Handling malloc() failure is almost never done for short lived programs.

True, and that makes sense for something like git. But in my experience many long-lived programs don't bother to handle ENOMEM gracefully either.

But I guess I'm veering off-topic here, I'm mostly fine with applications crashing of their own volition when they don't have enough memory. I agree with you that in many cases there's no clear recovery path for an application that's out of RAM. It's the OOM-killer I have a problem with.

>While C has no special error handling mechanism in place, error handling can still be done reasonably.

I very much disagree with that. There are a few factors that make error handling in C a pain:

- No RAII, so you have to explicitly handle cleanup at every point you may have to early-return an error (goto fail etc...).

- No convenient way to return multiple values from a function. That means that in general functions signal errors returning some special value like 0 or -1 (even that is very much nonstandard, often even within the same library).

Oh you want to be able to signal several error conditions? Uh, maybe use several negative codes then? Oh you need those to return actual results? Well maybe set errno then? Don't forget to read `man errno` though, because it's easy to get it wrong. Oh you had a printf in DEBUG builds in there that overwrote errno before you could test it? Oops. Don't do that!

What's that, your function returns a pointer and not an integer? Ah, mmh, well maybe return NULL in case of error? You want to return several error codes? Well maybe you can just cast the integer code into a pointer and return that, then use macros do figure out which is which. It's terribly ugly? Well the kernel does it so... It can't be that bad right? Oh and what about errno? Remember that?

What's that, NULL is a valid return value for your function? Uh, that's annoying. Maybe use an output parameter then? Oh, or maybe some token value like 0xffffffff, that probably won't ever happen in practice right? After all that's what mmap does.

So no I wouldn't consider C error handling reasonable in any way shape or form. "Non-existent" is more accurate. You can always work around it but it always gets in the way.

I try to always implement comprehensive error checking in my programs. I do a significant amount of kernel/bare metal work, so it's really important. It's not rare that I end up with functions that contain more error-handling-related code than actual functional code.

You are making things sound way more complicated than they need to be, the situation is actually very simple: if you need to return multiple error codes, use a return value for the error code and give back things via an output parameter, otherwise just use a sentinel value for error (0, -1 or NULL depending on context, they aren't totally random you know, 0 and nonzero are used for false/true, -1 is used when you expect some index and NULL when you expect some object). When in doubt just use an error return code everywhere (e.g. what many Microsoft APIs - even some C++ ones - do with HRESULT).
If it's not that complicated please explain why OpenSSL, the linux kernel, Curl a multitude of very popular C libraries don't do what you describe. Clearly it's complicated enough that even talented C coders try to cut some corners when given the chance.

C error handling ergonomics are non-existent which means that everybody bakes ad-hoc library-specific conventions that are extremely error-prone.

You could argue that they're doing it wrong and you might have a point but if almost everybody gets it wrong maybe it's fair to blame the language itself a little bit.

I already gave an example of APIs that do this - pretty much all COM APIs use HRESULT. I do not know why not everyone does this as i'm not everyone and as such i cannot tell what sort of considerations (if any) were going on. At best i can make some guesses.

BTW curl does seem to do what i wrote above, for example `curl_easy_init` returns a `CURL` object on success or NULL if there was an error [1] and `curl_easy_perform` returns a `CURLcode` value [2] that looks like it is used across the API to indicate errors.

[1] https://curl.haxx.se/libcurl/c/curl_easy_init.html

[2] https://curl.haxx.se/libcurl/c/curl_easy_perform.html

The kernel very much returns sentinel values, if something more complicated has to be transmitted error codes are commonly used. I see nothing wrong with it.
I'm not arguing that the kernel devs are doing it wrong. I'm only pointing out that, in my opinion, the way C deals with error handling (that is, by not doing anything at all) is far from reasonable and the cause of many bugs. It's terrible ergonomics.

If you have a kernel function returning a pointer and you think that you're supposed to check for NULL when it actually returns a ERR_PTR in case of errors you will not only fail to do the check but on top of that end up with a garbage pointer somewhere in your program. If you have a MMU and you try to de-reference the pointer you'll have a violent crash, which at least shouldn't be too hard to debug. If you feed the pointer to some hardware module or if you're working on an MMU-less system then Good Luck; Have Fun.

C doesn't have your back here. It doesn't let you signal how a function reports errors, it doesn't even let you tag nullable pointers.

Often you need to return error objects. Consider a function for parsing something. You want to return not only the error code, but also the line and column number of the parse error, and a description of it. So you need two output parameters; one for the result and one for the error. Your declaration becomes something like this:

    bool parse(inp_type *a, out_type **b, out_error **c);
where the return value false indicates an error. In C++, you'd just have written something like:

    out_type parse(const inp_type& a);
and thrown an exception on error.
In C you can return a struct, however a better approach is to use a context object which also contains error information, like:

    ctx_t* ctx = ctx_new();
    if (!ctx) ... fail ...
    if (!ctx_parse(ctx, code)) {
        show_error_message(ctx_erline(ctx), ctx_ercol(ctx));
        ... more fail ...
        ctx_free(ctx); /* often done in a goto'd section to avoid missing frees*/
    }
This also allows you to extend the APIs functionality, error information, etc in the future while remaining backwards compatible.
Which is great, except that ctx_new() requires a malloc, which then can fail, and now you can't even explain why the thing failed, as you have no context info.

You also have to worry about all of the ctx objects you've created along the way, to free them up as you recover from the low memory error.

That is very similar to the way I handled errors back in my C days.
Yep, you're absolutely right. But don't tell me that is simple! :)
> No RAII, so you have to explicitly handle cleanup at every point you may have to early-return an error (goto fail etc...).

I think RAII can be useful, but I've never found any use for it in systems level code that I write. Most of the time I'm dealing with resources that were allocated inside a systems library or an external component which just gives me a handle to the resource. I think this is a common enough scenario in systems code that I don't think its just me.

e.g.

    1. X = CreateResource()
    2. Y = TransformResource(X)
    3. ProcessNewResource(Y)
    4. Z = TransformResource(Y)
    5. etc. etc.
And so as you transform that resource, you will have multiple ways to unwind the resource depending on where the failure occurs. Even if you wrap X in some RAII container, you don't know what your destructor is going to look like.

Another con to RAII, especially when paired with shared-ownership smart pointers, is you lose predictability over your resource deallocs. You never know when the last pointer is going go out of scope, and if its a 'heavy' resource with a complicated unwind, you're going to get a CPU spike at an indeterminate time. I deal primarily with industrial automation code and I much prefer to have a smooth/even CPU graph. I think this issue is more relevant to systems code which is the context of this thread.