Hacker News new | ask | show | jobs
by cetu86 4609 days ago
Very interesting discussion. Here are my 2 cents: I agree, that how shell handles filesnames and confuses them with commands is inherently broken.

I would like to focus on whtat filenames are supposed to be. In my understanding filenames are supposed to be like booktitles or labels on goods like in the supermarket. So they are not supposed to contain random binary data. That is what file's contents are for. Filenames should however contain any printable character that you expect on a label. And no fake characters like beep or linefeed. But they should be fully international. I mean this is the 21st century! :-) In order to define printable characters you also need to define a character encoding. Unicode clearly defines 2 sets of control characters c0 and c1. I would exclude these two sets, but allow any other unicode character. I now there is an argument about which unicode encoding is better (utf8, utf16, the way apple encodes unicode vs the way everyone else does, ...) Maybe one could define the filesystem's encoding inside it and even give the kernel a translation layer between the ondisk encoding and the one visible to the user.

1 comments

why not make rulesystems modular?
Of course. So everyone can decide wether to use it or not. Or even switch this on at some point in the boot sequence Currently the linux kernel doesn't have an interface for this. But I think it is important to do this within the kernel so no malicious program can bypass it.
I agree, I think having a CONFIGURABLE option in the kernel where admins can decide "what is allowed" would be a big step forward. (1) Enable requiring UTF-8 encoding, and (2) list what bytes are allowed/forbidden at the beginning, the middle, and the end. Then you could have a local policy like "UTF-8 only", "no control chars", "no dash at beginning", and "no space at the end".
This also means that programmers can document what they require, and eventual standards can emerge, which would be a good thing.