Hacker News new | ask | show | jobs
by huhtenberg 1000 days ago
You are missing OP's point - this still costs you 2 extra calls.

If this cost really matters (and practically speaking it never does), then, as the other commenter said, the correct solution is to just use OS-native encoding for all file system paths and names used by the program, hidden behind an abstraction layer if needs be. UTF16 for Windows, UTF8 elsewhere.

3 comments

The above manifesto makes the argument to use UTF-8 *everywhere*, even on windows where the internal representation is not native utf8.

The conversion overhead is really negligible: https://utf8everywhere.org/#faq.cvt.perf

(note: the two api calls per conversion is because how those specific functions work, first call to get the size to allocate, second to do the actual conversion, but you can always use another library in the implementation for the utf8<->utf16 conversion that might be more optimized than those windows api functions)

Especially negligible versus the trip to the file system you are setting up for.
Not all API calls are for filesystem access.
Sure, but basically everything having to do with file paths on Windows, the topic here, relates to the file system.
"2 extra calls" is a weird metric here. Some calls are vastly more expensive than others. Syscalls come with a significant cost, encoding conversion of short strings (esp. filenames) does not. Hiding just the syscalls behind an abstraction layer is vastly simpler than doing that and additionally hiding the string representation, so "UTF-8 everywhere" is IMHO the right solution.
I thought the OP's point is there are too many considerations when doing this?

Someone is suggesting a way of making it less tedious, and your response is "performance?!" even though in both scenarios you're running the same code and it is likely the compiler in release would remove the intermediary.

> your response is "performance?!"

No, that's not what I said.