Hacker News new | ask | show | jobs
by dcode 1385 days ago
Despite that speed is not the critical concern (asymmetric value spaces are), UTF-8 must guard against invalid byte sequences, while WTF-16, where all possible values are valid as long as byte length is a multiple of 2, does not. In practice, the guard in UTF-8 is part of the copy loop over a boundary, typically from untyped memory to untyped memory, while WTF-16 can indeed just memcpy. Unless SIMD can be utilized, the difference is about that of a loop over a load into a branch vs. a memcpy, in case this helps to quantify. Expect an additional final memcpy if the UTF-8 copy should fail before storing anything to the receiver's memory.