Hacker News new | ask | show | jobs
by a1369209993 2643 days ago
How will you deal with valid paths such as:

  /tmp/[DE][AD][BE][EF].txt # ext2 / linux
  # OR
  C:\stuff\[DEED][FFFE].txt # ntfs / windows
  # where [hex] indicates a single filesystem charater with that value
1 comments

One fun thing about the capability model is that at the system call level, there are no absolute paths. All filesystem path references are relative to base directory handles. So even if an application thinks it wants something in C:\stuff, it's the job of the libraries linked into the application to map that to something that can actually be named. So there's room for the ecosystem to innovate, above the WASI syscall layer, on what "C:\" should mean in an application intending to be portable.

Concerning character encodings, and potentially case sensitivity, the current high-level idea is that paths at the WASI syscall layer will be UTF-8, and WASI implementations will perform translation under the covers as needed. Of course, that doesn't fix everything, but it's a starting point.

That’s good to know, but the parent’s examples seem to be referencing the issue of filenames that aren’t valid Unicode. The Linux example is invalid UTF-8, since Linux filenames are natively arbitrary byte sequences. The Windows example contains an unpaired surrogate followed by the reserved codepoint 0xfffe, since Windows filenames are natively UCS-2.
WTF-8 could solve the windows issue and I think for Linux it’s time to demand unicode filenames :)