Hacker News new | ask | show | jobs
by beagle3 4669 days ago
Everyone here seems to agree that "exceptions are for exceptional conditions". The problem is that when you get down to details, there is disagreement about what exactly is an "exceptional condition".

e.g. - you are trying to open a file for reading. The file does not exist. Is this exceptional? That depends on context, but the function that opens the file, being in an independent library, is usually designed without this context.

If it does throw an exception, some people complain that "of course it'e expected that file won't be there sometimes! that's not exceptional".

If it doesn't throw an exception, some people complain that "we tried to open a file, but didn't succeed, of course that's an exception". But if you want to avoid an exception in this case, you'll need to check for existence before opening (LBYL "look-before-you-leap"), and get into a race condition (TOCTOU "time-of-check,time-of-use"), which is really bad.

So it very often happens that you are forced by your ecosystem to use exceptions for normal control flow. Claims that you can only use it for "exceptional / unexpected" tend to be incompatible with a project in which you do not develop/control all of the library you use to your strict standard of exceptionalness.

6 comments

Vague value judgments ("only for exceptional conditions") are the inevitable result of a failure to reason.

The semantic function of exceptions is just a way for summing additional values onto the return type of a function because it has results that are not contained within the primary type. In this way, they are a more general and better typed version of NULL (which has it's places - contrary to the modern dogma, these sort of features are needed due to inherent complexity). The standard ways of attempting to avoid this are to either use sentinel values that exist in your standard return type like fd == -1 for an error (thereby making your program less typed), or to create a top-level sum type for every aggregated function return type (cluttering your program with nominal types). Multiple values make the most sense, but those are ad-hoc product types, so you're eschewing the type system in favor of informal invariants.

The syntactic function of exceptions is to avoid constantly repeating (check for error, return error), which often leads to the poor practices of ignoring errors or calling a global exit(). One goal of programming languages is to automate, so it makes sense to capture this oft-repeated pattern. But problems arise when people end up forgetting that every function can have a possible return immediately following it.

It seems some syntactic middleground is needed to signal the complete return type of a function definition, and the possibility that a given function call may quick-return. Honestly (and I hate to say it), but Java probably started down the right track with checked exceptions, but being a B&D language it ended up being waaaay too verbose. And lacking a way to aggregate types along anything but the baked-in hierarchy, people fell into using generic and uninformative 'throws Exception'. And open types make it so there's little point trying to enumerate exhaustive causes. But that doesn't mean that one can't start with the idea of non-silent but syntactically lightweight exceptions and come up with something that avoids a lot of the pitfalls.

lol... I've always felt that way about Java
Overall, very good comment. My one point of disagreement is where you dismiss sum types because they clutter your program. Since sum types are the correct answer to this problem in theory, it seems to me that we shouldn't move on from that answer due to problems with implementing it in practice. Further, I think that the problem has already been solved pretty well in languages like Haskell that support type classes (like Monad) which make it so that you no longer have to worry about the sum types except when explicitly working with them (e.g. pattern matching on Left and Right). There's still room for improvement on working with sum types in Haskell (e.g. nested sum types can be annoying), but it's the best solution I've seen so far. Joel even makes note of Haskell's solution being good in the article, but only briefly.
> My one point of disagreement is where you dismiss sum types because they clutter your program ...

> There's still room for improvement on working with sum types in Haskell (e.g. nested sum types can be annoying)

What I dismissed is requiring sum types to be nominal. For instance, imagine that Haskell used the Scheme/Java representation of objects (ie everything is basically a member of one sum type, discriminated on a machine word in the header). We could then do things like:

    type Foo  = Bar | Baz
    type Foo2 = Bar | Baz
Where Bar and Baz could be any type. Now both Foo and Foo2 are just different names for exact same thing, and in fact the names Foo/Foo2 are irrelevant when pattern matching the result of a function that's been declared to return (Bar | Baz). This philosophy is a bit different from Haskell in that it assumes that "everything is an object" (rather than the zero-overhead structs of Haskell), and it implies that every sum type defined this way can only contain one branch for each included type (without names, there's no way to differentiate them), but a merging of the semantics could definitely be hammered out.

I think Rust (being not quite formed yet) could benefit from taking a stab at this, having every sum discriminator be globally unique, and every non-sum type having an associated global tag that only gets prepended when it is promoted to being an anonymous branch of a sum. The immediate use I envision is being able to create ad-hoc type hierarchies that are descriptive rather than prescriptive.

I'm really fond of the idea of Nullable types (where you have to unwrap them by checking if they're null or errors to get the value). I don't think any non-functional language other than Rust has attempted to work this into the way they work yet, but it strikes me as a powerful mechanism that could also allow for deferring error handling to the site most capable of dealing with it.
Scala has the Try [0] and Either [1] types that can be used to achieve this. However, I believe Scala also allows any type to be NULL. In Rust there is no NULL so you don't have to worry if a value is ever NULL unless it is explicitly wrapped in an option instance.

[0] http://www.scala-lang.org/api/current/index.html#scala.util.... [1] http://www.scala-lang.org/api/current/index.html#scala.util....

If you use Guava, you even get Optional<T> for Java. It works pretty well, but is obviously something you have to work into your API and can't easily use after the fact.
Completely agree with the gist of what you are saying, but exceptions are MORE than just a sum/product/extended value type: They are a non-local return. You're obviously aware of that, from the Java comment, but seem to consider that fact an implementation issue of Java, whereas most people consider this a defining attribute of exceptions.
They're restricted to only escaping currently active stack frames, not jumping anywhere like full continuations. They're basically equivalent to having a default cascade of

    if (ret == -1) return -1;
after every function invocation, which is why I was calling that aspect a syntactic feature.

It's this implicit return after every function that trips people up (when prematurely exiting from stateful computation). So what I was saying that making the call of an exception-throwing function (slightly) more verbose would alleviate that and pay for the complexity where it was used.

> Multiple values make the most sense, but those are ad-hoc product types, so you're eschewing the type system in favor of informal invariants.

Maybe the right approach would be to enrich the type system so that it can express those invariants.

You forgot to mention one of the main criticism to use exceptions for errors: the processor penalty.

Related discussions:

- http://stackoverflow.com/questions/8805238/run-time-penalty-...

- http://stackoverflow.com/questions/299068/how-slow-are-java-...

You forgot to mention one of the main criticism to use exceptions for errors: the processor penalty.

As with much of the exceptions debate, it’s important not to over-generalise here.

For example, if you’re writing in a compiled language like C++ and your compiler uses a table-driven implementation for the exception mechanism (as most modern ones did, the last time I checked) then there isn’t necessarily any direct runtime overhead at all when no exception is thrown. In fact, the non-exceptional code path can even run a little faster than equivalent code with manual error handling via return codes, if conditional logic for propagating error codes can be omitted at all the intermediate levels between the one(s) where exceptions are thrown and the one(s) where they are caught.

On the other hand, the possibility of an exception being thrown might interfere with some optimisations. Also, the jump tables can be huge: I once saw the compiled output for a moderately large code base drop in size by 1/3 just from compiling it with exceptions disabled.

In short, there are a lot of factors at play, but anyone who parrots the line that using exceptions always slows things down has never spent much time looking at what actually happens with real compilers. And of course, this is only in one type of compiled language, which doesn’t necessarily imply anything about the performance characteristics of other languages (which vary widely).

Agreed that it's a poor overgeneralization, and often trotted out when it's absolutely incorrect, but it's worth noting that on Windows the existence of Structured Exception Handling in the OS prevented this kind of optimization for a long time. It may yet, in fact, but I've been out of that world for a long time.

This probably extended to things like the xbox as well.

Fortunately that issue has waned with the decline in the x86.
> you are trying to open a file for reading. The file does not exist. Is this exceptional?

It's actually handy to have an API that includes both situations. For example, parse and tryParse in C#. Parse will trigger an exception if it fails but tryParse will not. When used in your code, this actually documents what the programmer is expected. If an open and tryOpen operations existed, you would know whether or not the programmer expects the file to exist or not.

The rule is that all functions should return a valid result or not return at all. Aka the Samurai Principle: http://c2.com/cgi/wiki?SamuraiPrinciple

However, an important point is that a "valid result" maybe a status indicating what went wrong. For example, a function that reads an http url may return (OK, 200), (NOT_FOUND, 404), (REDIRECT, 301) and so on. But in addition to that, the function may also throw an exception if a dns lookup error occurred for example.

Opening files on the other hand, can fail unexpectedly for a billion different reasons. Most of which an opening function can't detect or do anything about and therefore can't return a valid result if they occur. Therefore it must throw an exception.

> the function may also throw an exception if a dns lookup error occurred for example.

What's wrong with (DNS Lookup Failure, -17), as long as it is documented? why doesn't 404 NOT_FOUND merit an exception, yet a DNS lookup failure does?

(I can argue both ways, I'm just trying to point that there's no clear criterion)

> therefore can't return a valid result if they occur. Therefore it must throw an exception.

But open() in C does return a valid result on any of those billion reasons: (fd -1, errno reason). Yes, errno is returned in a global variable, but it is still a return from the open() call. Therefore, exceptions are never needed according to your logic?

>> the function may also throw an exception if a dns lookup error occurred for example. > What's wrong with (DNS Lookup Failure, -17), as long as it is documented? why doesn't 404 NOT_FOUND merit an exception, yet a DNS lookup failure does?

Because then every function that could potentially cause a dns lookup failure needs to have that error code documented. Then ask yourself this, what's wrong with having (Out of Memory Failure, -1234) in addition to DNS Lookup Failure and HTTP Status codes?

Think of yourself as a http function. You know about the http protocol and therefore from your perspective a 404 NOT_FOUND is a valid result. However, you do not know about DNS lookups or memory allocation therefore if a problem occurs in those areas it is exceptional -> exception.

On the other hand, if you were a memory allocator function then returning out of memory instead of an address to the allocated memory would be fine. Because memory handling is your job.

> But open() in C does return a valid result on any of those billion reasons: (fd -1, errno reason).

Syntactically valid, but not semantically as -1 isn't a valid file descriptor.

> Then ask yourself this, what's wrong with having (Out of Memory Failure, -1234) in addition to DNS Lookup Failure and HTTP Status codes?

Indeed, I see nothing wrong with this.

> However, you do not know about DNS lookups or memory allocation therefore if a problem occurs in those areas it is exceptional -> exception.

This is where the disagreement lies. When I imagine myself as an http function FOR MOST USES, I imagine a "full service" function. You give me a URL (+post data), and get back a return code and a page. This functionality is sufficient for 99% of web uses.

If I can't get a 20x (possibly after following a 30x or 401 unauthorized response), the user in 99% of the cases does not care what the reason for failure is - they just need to know it failed and some reason to log/display.

Turn it the other way around: Why should a user of an http library care about internal implementation details like DNS resolving, to the point of needing to include an explicit check for them? And if you're advocating for "catch all exceptions", how is that different from a multiple-return-value?

> Syntactically valid, but not semantically as -1 isn't a valid file descriptor.

So what? It is a valid return from the "open" function. No, you can't use it for reading or writing or ioctl - but then, whether or not you can do that to any other returned descriptor depends on what that descriptor is (/dev/random? /dev/null? a socket?)

I'm not disagreeing that exceptions can be useful. I'm just disagreeing that what is exceptional vs. what is regular is a well defined thing.

It's possible for a library to return an error to indicate that a file is not found when you try and open a non-existent file. Then it's possible for a function that calls the open file function to decide whether that error should be translated into an exception i.e. should that file being missing break out of the current execution? There's no race condition there.

I don't think anyone reasonably expects a library to know the context of the file being opened and its importance in the logic of the rest of the program.

It seems to be a pattern in Python APIs to allow the caller to decide if a given function call should throw an exception when an error occurs. For example,

    a = some_dict['nonexistent_key']
would raise a KeyError, while

    a = some_dict.get('nonexistent_key', 'DEFAULT')
would return 'DEFAULT' instead.
Going one step further on the file example, I had a college professor who wrote a function read from a file that had no end except an exception thrown by the read because the file was at its end[1]. He said the end-of-file was an exceptional circumstance for a function that expected to read and process a line.

I doubt anyone would say end-of-file is unexpected, but I am not sure I would say it was exceptional.

1) something like this (it has been 20yrs)

  init data structure S
  open file A
  loop
    read line from A
    process line and add to S
  next
  catch EOF
    close A
    return S
  catch file-not-found
    return empty S
any mistakes are my memory not my old professor
In Java, everyone would tell you this is wrong. In Python, this is the normal way to end an iteration (specifically, throwing StopIteration). It's not a huge leap to see a stream as an iteration over bytes. So, what's considered exceptional seems somewhat culturally dependent.
Note that in Python you usually use the syntactic sugar of for loops rather than interacting with StopIteration manually.
True! I should have made that clearer.

The only time you need to think about StopIteration is if you're consuming from an iterator in some unusual way (eg writing a "get me the single item which is in this iterator, or blow up if there are zero or more than one" function) or if you're writing an iterator which is not built out of building blocks which already know how to stop an iteration.

So, not often.