|
|
|
|
|
by juancn
1538 days ago
|
|
You should normalize names on write, on read is very hard to fix. You can have a perfectly valid, denormalized strings representing codepoints with different normalizations. So if you have four possible normalizations: NFD, NFC, NFKD, NFKC and your string has N ambiguous codepoints, the number of possible strings you need to try is N^4. |
|
NFD was used by HFS+, but got abandandoned by the current insecure APFS. (which uses unidentifiable names). NFD is faster to produce, but NFC is complete and needs less space. with NFD you can still have reordered sequence variants, and thus nonidentifiable names.
and the NFKD, NFKC hacks should only be used with python internally, because they didn't understood Unicode. or just read the TR's without understanding it.
it will need several more decades until filesystems will find out about their wrong decisions. maybe I'll bug them with CVE's some day.