Hacker News new | ask | show | jobs
by barbegal 2264 days ago
Except that not all binary data is valid UTF-8 so you also need functions that check if a binary buffer is valid UTF-8.
1 comments

The decoding phase will do that, if needed. Also note that in many cases you must process it as opaque binary, even though it should be valid UTF-8. This is in particular with filenames on POSIX systems because otherwise you could not access any files that happen to have invalid UTF-8 in their names.