Hacker News new | ask | show | jobs
by aloisklink 596 days ago
POSIX does actually define what a "text file" is, but the definition is a bit unusual:

See https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/V1...

> 3.387 Text File

> A file that contains characters organized into zero or more lines. The lines do not contain NUL characters and none can exceed {LINE_MAX} bytes in length, including the <newline> character.

So, if you have some non-printable characters like BEL/␇/ASCII 0x07, that's still a text file.

(and I believe what bytes count as a valid character depend on your `LC_CTYPE`).

But the moment you have a line longer than {LINE_MAX} bytes (which can depend on which POSIX environment you have), suddenly your text file is now a binary file.

1 comments

Kind of a weird definition indeed. One edge case: the definition states the file must contain characters, so presumably zero length files are out. But then how could you have zero lines?
POSIX defines a line as:

> 3.185 Line

> A sequence of zero or more non-<newline> characters plus a terminating <newline> character.

So a file with some characters but no trailing newline is reported by `wc -l` as having zero lines.

An empty file is not hard to make. It's just a matter of creating the file and not writing to it.
Yes obviously. But the POSIX specification for a "text file" as above is that it contains characters, which an empty file by definition does not. So an empty file cannot be a text file if you read that specification strictly, and therefore you cannot have zero lines in a text file. As soon as you have a single character there is at least one line, and the amount of lines can only stay the same or grow from there.

The definition should read "one or more lines" instead or (probably better) specify that a text file contains "zero or more characters".

Ahh I see what you're saying. I misunderstood at first.