Hacker News new | ask | show | jobs
by pstuart 4254 days ago
> That breaks when you have newlines in filenames, no?

That seems like an extremely pathological case.

2 comments

Pathological or not, ensuring that pathnames can essentially contain any byte value except the 0 terminator, and it will still work, is important to prevent surprising behaviour which often has security implications.
The only character not allowed in Unix file names is the forward slash directory separator, so even that would be a pathological mistake waiting to bite someone.

Edit: my mistake, they can't contain nulls either: https://news.ycombinator.com/item?id=8485861

too pathological; didn't implement
> That seems like an extremely pathological case.

When a human is creating files by hand, I almost certainly agree. When a program is creating files, however, it's only a matter of time before weird characters wind their way in there.

I really wish newlines had been disallowed. (There's UI implications, in addition to the parsing ones — how do you do a list view with newlines in the filename?; I also wish filenames had a reliable character set and weren't just bytes.)

I think dwheeler is trying to get this fixed/standardised in POSIX via the Open group.
That it's going to be an uphill battle is an understatement.

Someone replied on LWN, when he posted his proposal, that he had implemented a sort of home-grown database using non-UTF8 characters for the file names.

Rube Goldberg, indeed!

how do you do a list view with newlines in the filename?

Show them with the standard escape sequence for a newline:

    This\ filename\ncontains\ a\ newline
Same for any other characters that could be considered 'special' in output; I really wish the backslash convention for escaping was more common. Character sets and such are a UI/display issue, so I don't think there should be any special handling for them at the lower levels of the system.
UI issues; on display format all /printing display elements/ (including spaces as spaces and things that look like whitespace but aren't spaces) with readable glyphs, or those numeric standins for non-rendering glyphs.