| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by huhtenberg 1000 days ago
	You are missing OP's point - this still costs you 2 extra calls. If this cost really matters (and practically speaking it never does), then, as the other commenter said, the correct solution is to just use OS-native encoding for all file system paths and names used by the program, hidden behind an abstraction layer if needs be. UTF16 for Windows, UTF8 elsewhere.

3 comments

ormax3 1000 days ago

The above manifesto makes the argument to use UTF-8 *everywhere*, even on windows where the internal representation is not native utf8.

The conversion overhead is really negligible: https://utf8everywhere.org/#faq.cvt.perf

(note: the two api calls per conversion is because how those specific functions work, first call to get the size to allocate, second to do the actual conversion, but you can always use another library in the implementation for the utf8<->utf16 conversion that might be more optimized than those windows api functions)

link

donatj 1000 days ago

Especially negligible versus the trip to the file system you are setting up for.

link

Tempest1981 999 days ago

And AVX-512 can help:

https://lemire.me/blog/2023/09/13/transcoding-unicode-string...

link

jiggawatts 999 days ago

Not all API calls are for filesystem access.

link

donatj 999 days ago

Sure, but basically everything having to do with file paths on Windows, the topic here, relates to the file system.

link

ynik 1000 days ago

"2 extra calls" is a weird metric here. Some calls are vastly more expensive than others. Syscalls come with a significant cost, encoding conversion of short strings (esp. filenames) does not. Hiding just the syscalls behind an abstraction layer is vastly simpler than doing that and additionally hiding the string representation, so "UTF-8 everywhere" is IMHO the right solution.

link

Someone1234 1000 days ago

I thought the OP's point is there are too many considerations when doing this?

Someone is suggesting a way of making it less tedious, and your response is "performance?!" even though in both scenarios you're running the same code and it is likely the compiler in release would remove the intermediary.

link

huhtenberg 1000 days ago

> your response is "performance?!"

No, that's not what I said.

link