Hacker News new | ask | show | jobs
by asveikau 2115 days ago
> Why did I pay for this machine if it weren't intended to facilitate me?

I happen to agree with the idea that the filename should be a dumb blob of bytes and the kernel should not do case folding, as it is the wrong layer for that, eg. the user can change their language but it won't update what has been written to the disk in thousands or millions of places where you could suddenly have a filename collision somewhere based on those rules changing.

But, I do hope you get that refund for your Linux.

4 comments

That's IIRC how it used to be treated on ext based file systems until now. Everything allowed except for / and NUL bytes.
> dumb blob of bytes

Well, now your filename is invalid utf8. How should programs display it or even address such a file?

How does the UI framework act when you set a label to such payload? How does your web browser act when it sees it in HTML? I have found working on apps that see a lot of usage in varied markets that as much as we wish to see the best and ideal conditions, malformed utf-8 surfaces in the real world pretty often.
> How should programs display it

what's wrong with foo����.txt

> or even address such a file? ... by using the array of bytes ?

The fact that if one has two files, say “test{invalid bytes}.txt” and test{other invalid bytes}.txt”, both have replacement characters inserted at the same spot and would decode to the same codepoints.
It's ambiguous, for example.
so are a file named Hello.txt and another one named Нello.txt
> Well, now your filename is invalid utf8.

That's reality. An OS which can't keep up with reality is broken.

I understand that NTFS has its own case folding table which is written once when the volume is formatted. This does seem to have stood the test of time and enormous usage so maybe it is not such a poor idea.
That doesn't sound great if somebody formats your USB stick in Turkey and suddenly speakers of western European languages can observe 'i' as case sensitive.
I believe you’ve just shed light on a twelve year old bug in creating bootable USBs under Windows XP.
> filename should be a dumb blob of bytes

This hasn't been true since the days of CP/M

For e.g. the Linux kernel, besides path separator(s), why do you think that?

All of the wide/special-case manipulation when writing code on e.g. Windows drove me nuts.

Out of interest, what special-case manipulation? I generally treat file paths as opaque `\` separated strings (or even as a single blob if I don't need to parse it). I'm uncertain why I'd want to treat them specially. I'll leave that to the OS.
Take for example a file on NTFS.. the filenames can be UTF-16 (they support 16 bit chars under the hood).. but they also might not be valid UTF-16. When you access the file by the filename, you now have the potential to use all of the wide function calls (e.g. _wfopen) or the the ansi versions (e.g. fopen).