Hacker News new | ask | show | jobs
by Too 1257 days ago
I hope you are not referring to things like errno? Because it, just like a returned error, must be checked after every call that might set it, before you know if you can safely proceed with the next.

Otherwise, what happens when you keep calling functions and there already is an error present? Are all functions implemented with if(errno) return null guards at the top? That's putting a lot of trust into global state and library writers. How do you know which functions become noops and which continue working in state of error?

Additionally, that would be a debugging nightmare, because if you keep calling functions before examining the error, how do you know which call introduced the error first?

2 comments

> how do you know which call introduced the error first

You don't. You can't. You don't want to.

Take read()/write() for example. If you look into the kernel, it's physically impossible to name a function call that "introduced" the error. If you do a write() that will simply copy some memory into the buffer, and the syscall returns. When pages are flushed out to storage later, an error might be reported from storage asynchronously. The error is bubbling back at some point, at some syscall related to the file, but the error has nothing to do with that syscall necessarily. The error you get back could even be "caused" by a write to the same file but from a different process.

So it's perfectly reasonable that the FILE API, which wraps read()/write(), simply stores returned errors in the FILE Handle. Distributed systems are a perfect application for objects that do error isolation.

Delayed error ack is a completely orthogonal issue. Only necessary as a performance workaround, both in the case of unflushed disk buffers, sockets and distributed systems.

Parent presented sticky errors as an effective substitute for exceptions or error codes. Delayed errors is not a way to organize error handling easier, which I believe this topic was about. Delaying the error ack has quite the contrary effect, fail fast whenever possible will always be more accurate. How that surfaces to the caller is the more relevant question.

What happens when the disk is full or disconnected when copying 4GB src file to dst file after 100MB progress? (Yes, this error might occur slightly delayed due to buffers.) You surely don’t want to continue reading the remaining 3.9GB source file and call write() in noop-mode another thousand times in your loop before realizing this error on flush. Adding manual flushing just to check the error both negates the performance from the buffer and introduces extra complexity for a simple error check. Hence, every individual write must be checked regardless.

Such buffers are not infinite either. What do you do when write() fails because the buffer is full (EAGAIN)? Again back to square one of checking each individual write call instead of only checking the final error of flush or close.

If you look around there are lots and lots of objects that are "distributed", or aren't but should be. Synchronicity is often what's killing performance and introducing complexity.

> You surely don’t want to continue reading the remaining 3.9GB source file and call write() in noop-mode another thousand times in your loop before realizing this error on flush

It can be completely reasonable to back out only at strategic points. Copying a few KB or MB of memory more will rarely matter for an error case that shouldn't be optimized for. If there is an error, you'll typically want to reset a larger context object anyway. It depends on the situation, but by not having to handle the error at first notice, you can sometimes simplify the logic.

> What do you do when write() fails because the buffer is full (EAGAIN)?

EAGAIN is a different beast, it's not a "real" I/O error. With better APIs you retrieve buffers first (often in a different phase), removing this class of errors completely. But you can mostly just ignore EAGAIN anyway. It's a transient error (or not an error at all, really) that simply tells you the reason why zero bytes were written.

With fwrite(), not sure if it is well specified how it should interact with non-blocking FDs and EAGAIN. Probably it doesn't even allow you to distinguish between EAGAIN and I/O errors. It could also be an option to return a short write in this case (but I believe fwrite() needs to set either the error of EOF flag if it returns a short write). I also think fwrite() is largely not used with non-blocking FDs.

Yes like errno, except not global of course, you add one per struct/context/module/thread/whatever. And you design the functions to early return if the error state is set.

>how do you know which call introduced the error first?

You rarely care about that but if you do you either make the error state stick to the first error, or as I said, make it a list you can append multiple errors to.