Hacker News new | ask | show | jobs
by anonymousiam 2292 days ago
CP/M and DOS use ^Z (0x1A) as an EOF indicator. More modern operating systems use the file length (if available). Unix/Linux will treat ^D (0x04) as EOF within a stream, but only if the source is "cooked" and not "raw". (^D is ASCII "End Of Transmission or EOT" so that seems appropriate, except in the world of unicode.)
3 comments

Strictly speaking, as discussed elsewhere in this thread, ^D can cause a terminal device to signal an EOF condition; other kinds of Unix byte streams don't make this association.

For example,

  $ python3 -c 'print("".join(chr(c) for c in range(10)))' | python3 -c 'print(list(ord(c) for c in input()))'
will confirm that it doesn't happen in a pipe (the ASCII 4 character there is totally unrelated to EOF).
I'm pretty sure the DOS TYPE command (its version of cat) would stop at the first ^Z it encountered, even if the file was longer.

It was sometimes used to have TYPE print something human readable and stop before the remaining (binary) file data would scroll everything away

> It was sometimes used to have TYPE print something human readable and stop before the remaining (binary) file data would scroll everything away

Notably, in the PNG file format (created back when MS-DOS was still very relevant):

"The first eight bytes of a PNG file always contain the following values: [...] The control-Z character stops file display under MS-DOS. [...]" (http://www.libpng.org/pub/png/spec/1.2/PNG-Rationale.html#R....)

Maybe not for DOS, but for CP/M it most certainly is true, since the length of a file in bytes is not stored anywhere. Only the number of (typically 128 byte) sectors.

For binary files, you just assume there is padding at the end of the file to the end of the sector. For text files, the SUB code was used to indicate where the file ended.

It’s not true, plenty of DOS programs stopped I/O operations with ctrl + z, and exited with ctrl + c. What you are saying is that obviously there was no physical 1A byte to demarcate the end of the file, but 1A was used pretty much everywhere. And it’s actually a non printable character: https://en.m.wikipedia.org/wiki/Substitute_character So I’m missing the point of this article, CTRL Z and CTRL D are obviously non printable characters and of course they are not used anymore to demarcate the actual end of a file.
Using the "file length" as opposed to the "EOF indicator" is like how strings can either be represented as pointer to a contiguous sequence of `char` ending with a NULL byte, or as a tuple of (length, pointer), without the needed NULL byte.

One gives a priori information the other a posteriori.