| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pdonis 2391 days ago
	> the teams insistence that file names were byte strings was the cause of lots of bugs when it came to Unicode support File names are a different problem because Windows and Unix treat them differently: Unix treats them as bytes and Windows treats them as Unicode. So there is no single data model that will work for any language.

2 comments

hsivonen 2390 days ago

The Rust standard library has a solution for this that actually works: On Unix-like systems file paths are sequences of bytes and most of the time the bytes are UTF-8. On Windows, they are WTF-8, so the API users sees a sequence of bytes and most of the time they match UTF-8.

This means that there's more overhead on Windows, but it's much better to normalize what the application programmer sees across POSIX and NT while still roundtripping all paths for both than to make the code unit size difference the application programmer's problem like the C++ file system API does.

link

pdonis 2390 days ago

> On Windows, they are WTF-8

Seems like an apt acronym for Windows... :-)

On a more serious note, Python seems to have done something fairly similar with the pathlib standard library module.

link

simias 2391 days ago

Not to mention case-sensitivity issues. Can you have two files, one named "FILE.txt" and the other "file.txt" in the same directory for instance?

link

SSLy 2391 days ago

On windows? Of course you can.

link

edgyquant 2391 days ago

I'm certain you can on Linux as well. Only Macs old HFS would not allow it.

link

cataflam 2391 days ago

Isn't this a fairly recent change?

link

amaranth 2390 days ago

NTFS has always been case sensitive, Windows API just lets you treat it as case insensitive. If you pass `FILE_FLAG_POSIX_SEMANTICS` to `CreateFile` you can make files that differ only in case.

link

mathw 2390 days ago

Good luck using those in some tools which use the API differently though. Windows filenames are endless fun. What's the maximum length of the absolute path of a file? Why, that depends on which API you're using to access it!

link

rurban 2389 days ago

Even worse on Unix where it depends on the mount type. Haven't seen much proper long filename support in Unix apps or libs, it's much better in Windows land. Garbage in garbage out is also a security nightmare as names are not identifiable anymore. You can easily spoof such names.

link