| > True; I was careful not to call it that You specifically called it UTF8, repeatedly. The very comment I quoted asserts that "Utf8 would help deal with the issue [of garbage inputs]" (in its denial of the opposite assertion). You also did it in https://news.ycombinator.com/item?id=33986421 > If you have two UTF-8 strings and you want to concatenate them, you just concatenate the bytes. That's not a unicode-aware operation, it's mostly a unicode-irrelevant operation (though unicode awareness can be useful in edge cases because of special grapheme clusters, but that's very task-specific). > But what if the strings aren't valid UTF-8?! Both of those operations work just fine even if the strings aren't valid and produce sensible, intuitive results. If your content is not actually UTF-8, you can end up with UTF-8, thus changing the semantics of the content. You can also end up with overlong UTF-8, which also changes the semantics of the content in a worse way. |