Hacker News new | ask | show | jobs
by fragmede 4267 days ago
> newline-separated list of unescaped file names

That breaks when you have newlines in filenames, no?

3 comments

File names often have spaces in them, but very rarely newlines. Based on xargs's current behaviour, it's clearly no problem to just not support certain characters in file names by default. I just think it would have been more useful for it to not support a smaller set of names.
You would be amazed what various broken tools can produce for filenames.
I can't decide if this is a rebuttal, or not ;) - assuming it is, note that the number of possible paths containing newlines OR spaces is smaller than the number of possible paths containing only newlines, so an xargs that didn't handle newlines by default would still be supporting more possible paths than it does in its current state!
Only the other week I managed to mouse-twitch into existence a file that rm refused to remove.
> That breaks when you have newlines in filenames, no?

That seems like an extremely pathological case.

Pathological or not, ensuring that pathnames can essentially contain any byte value except the 0 terminator, and it will still work, is important to prevent surprising behaviour which often has security implications.
The only character not allowed in Unix file names is the forward slash directory separator, so even that would be a pathological mistake waiting to bite someone.

Edit: my mistake, they can't contain nulls either: https://news.ycombinator.com/item?id=8485861

too pathological; didn't implement
> That seems like an extremely pathological case.

When a human is creating files by hand, I almost certainly agree. When a program is creating files, however, it's only a matter of time before weird characters wind their way in there.

I really wish newlines had been disallowed. (There's UI implications, in addition to the parsing ones — how do you do a list view with newlines in the filename?; I also wish filenames had a reliable character set and weren't just bytes.)

I think dwheeler is trying to get this fixed/standardised in POSIX via the Open group.
That it's going to be an uphill battle is an understatement.

Someone replied on LWN, when he posted his proposal, that he had implemented a sort of home-grown database using non-UTF8 characters for the file names.

Rube Goldberg, indeed!

how do you do a list view with newlines in the filename?

Show them with the standard escape sequence for a newline:

    This\ filename\ncontains\ a\ newline
Same for any other characters that could be considered 'special' in output; I really wish the backslash convention for escaping was more common. Character sets and such are a UI/display issue, so I don't think there should be any special handling for them at the lower levels of the system.
UI issues; on display format all /printing display elements/ (including spaces as spaces and things that look like whitespace but aren't spaces) with readable glyphs, or those numeric standins for non-rendering glyphs.
And \x0 separator breaks when you have \x0 in filenames. Pragmatically it's a question of rarity, but ultimately the shell should support something like prepared queries in SQL.