Hacker News new | ask | show | jobs
by dataflow 1561 days ago
> is it my program’s responsibility to check what happens to the bytes AFTER the pipe?

No, but it's not "after". Rather, it's your responsibility to handle backpressure by ensuring the bytes were written to the pipe successfully in the first place.

This isn't just about the filesystem being full btw. If you imagine a command like ./foo.py | head -n 10, it only makes sense for the 'head' command to close the pipe when it's done, and foo.py should be able to detect this and stop printing any more output. (This is especially important if you consider that foo.py might produce infinite lines of output, like the 'yes' program.)

I would argue this is not necessarily even an error from a user standpoint, so the return code from food.py should still be zero in many cases—a pipe-is-closed error just means the consumer simply didn't want the rest of the output, which is fine [1], whereas an out-of-disk-space error is probably really an error. Handling these robustly is actually difficult though, because (a) you'd need to figure out why printf() failed (so that you can treat different failures differently—but it's painful), and (b) you need to make sure any side effects in the program flow up to the printf() are semantically correct "prefixes" of the overall side effect, meaning that you'd need to pay careful attention to where you printf(). (Practically speaking, this makes it difficult to even have side effects that respect this, but that's an inherent problem/limitation of the pipeline model...)

FWIW, I would be very curious if anyone has formalized all of these nuances of the pipeline model and come up with a robust & systematic way to handle them. It seems like a complicated problem to me. To give just one example of a problem that I'm thinking of: should stderr and stdout behave the same way with respect to "pipe is closed"? e.g. should the program terminate if either is closed, or if both are closed? The answer is probably "it depends", but on what exactly? What if they're redirected externally? What if they're redirected internally? Is there a pattern you can follow to get it right most of the time? There's a lot of room for analysis of the issues that can come up, especially when you throw buffering/threading/etc. into the mix...

[1] Or maybe it isn't. Maybe the output (say, some archive format like ZIP) has a footer that needs to be read first, and it would be corrupt otherwise. Or maybe that's fine anyway, because the consumer should already understand you're outputting a ZIP, and it's on them if they want partial output. As always, "it depends". But I think a premature stdout closure is usually best treated as not-an-error.

2 comments

> This isn't just about the filesystem being full btw. If you imagine a command like ./foo.py | head -n 10, it only makes sense for the 'head' command to close the pipe when it's done, and foo.py should be able to detect this and stop printing any more output.

The usual way of handling this is by not (explicitly) handling it. Writes to a closed pipe are special, they do not normally fail with a status that the program then all too often ignores, they result in a SIGPIPE signal that defaults to killing the process. Extra steps are needed to not kill the process. No other kind of write error gets this special treatment that I am aware of.

That would be at best a Linux extension, not some general C behavior you can assume when writing your program.

That said though, I can't even reproduce what you're saying on Linux:

  printf '%s\n' '#include <stdio.h>' 'int main() { setvbuf(stdout, NULL, _IONBF, 0); int r = fputs("Starting\n", stdout); fflush(stdout); fprintf(stderr, "%d\n", r); }' | cc -x c - && ./a.out >&-
  // Prints '0' instead of dying
That's a POSIX thing. It doesn't apply to all C implementations but it does apply to many more than just Linux-based ones. You've not got a closed pipe so you wouldn't see it, you've just got a closed file descriptor. Try running it as

  ./a.out | :
and you will probably see it. I say probably because there is a timing aspect as well, the write may happen before the pipe gets closed in which case it will not fail, but it is unlikely to.
Yeah I should've said POSIX, my bad. But yeah my point was it's not plain C behavior.

And yes on Linux I do see it with your no-op example now. Though for some reason not with 'head'... what's going on? Is it not closing the pipe when it exits?

  $ printf '%s\n' '#include <stdio.h>' '#include <unistd.h>' 'int main() { setvbuf(stdout, NULL, _IONBF, 0); int r = puts("Starting...\n"); r += fputs("First\n", stdout); fflush(stdout); usleep(1000000); fprintf(stderr, "%d\n", r); }' | cc -x c - && ./a.out | head -n 1
  Starting...
  19
Edit: D'oh, see below.
That would be the timing aspect of it. You have:

a) a.out writes line 1

b) a.out writes line 2

c) head reads line 1

d) head closes the pipe

We know that b and c both happen after a, and that d happens after c. However, we do not know whether b happens before c, between c and d, or after d. Your a.out process will only get killed by SIGPIPE if it happens after d.

On my system, running a.out under strace is enough to slow it down enough to affect the timing and see the SIGPIPE you were expecting. You may alternatively insert artificial delays in your test program such as by calling the sleep() function between the two lines of output to see the same result.

Sorry, I think I edited my comment while you were replying. But I just noticed the problem in the most recent version was that I didn't write to stdout after the usleep(), so it never raised SIGPIPE. Thanks.
> it's not plain C behavior.

But pipes aren't a C thing in the first place. "unistd.h" is not a C thing, file descriptors aren't a C thing.

The C program would just be using stdio to interact with pipes.

And not every platform with pipes supports SIGPIPE with such behavior.

What the pipe does is orthogonal to what the programme should do. The problem here is that errors are not being handled. There are languages such as rust that enforce error handling, whereby the policy on error is made explicit. The nuances you highlight are around what the errors should describe, which ultimately leads to more potential granularity in the error policy.