| > A default UTF8 string-type has to be allowed to be used interchangeably with bytestrings since ASCII is a valid subset. This only works in one direction. Sure, any valid UTF-8 is a bytestring. But not every bytestring is valid UTF-8. "Use interchangeably" implies the ability to substitute in both directions, which violates LSP. > What comes in, comes in. Validate it elsewhere. I have a problem with that. It's explicitly against the fail-fast design philosophy, whereby invalid input should be detected and acted upon as early as possible. First, because failing early helps identify the true origin of the error. And second because there's a category of subtle bugs where invalid input can be combined or processed in ways that make it valid-but-nonsensical, and as a result there are no reported errors at all, just quiet incorrect behavior. Any language that has Unicode strings can handle ASCII just fine, since ASCII is a strict subset of UTF-8 - that doesn't require the encoding of the strings to be UTF-8. For languages that use a different encoding, it would mean that ASCII gets re-encoded into whatever the language uses, but this is largely an implementation detail. Of course, if you're reading a source that is not necessarily in any Unicode encoding (UTF-8 or otherwise), and that may be non-string data, and you just need to pass the data through - well then, that's exactly what bytestrings are there for. The fact that you cannot easily mix them with strings (again, even if they're UTF-8-encoded) is a benefit in this case, because such mixing only makes sense if the bytestring is itself properly encoded. If it's not, you just silently get garbage. Using two different types at the input/output boundary makes it clear what assumptions can be made about every particular bit of input. |