Hacker News new | ask | show | jobs
by StefanKarpinski 1282 days ago
The comment that you're quoting wasn't mine. In the comment you link to says "UTF-8 by convention". If either string is valid, then the result is as expected. If you're concatenating two strings that are both invalid UTF-8, there's not much you can do that's better than just concatenating the bytes together... which is exactly what treating them as byte arrays would end up doing (but it's less convenient). If you're worried about invalid UTF-8 you can check for validity (which again, is exactly what you end up doing if you use byte arrays).
1 comments

> The comment that you're quoting wasn't mine.

The comment I originally quoted was yours. The second quote is adapted from a statement you denied. It is thus your statement. Let me put both sections together since you seem unwilling to do so:

>> Utf8 would not help with the issue in the article in any way.

> It's not at all obvious how it helps, but it does.

So you are stating, unambiguously, that "Utf8 would help deal with the issue [of garbage inputs]".

> In the comment you link to says "UTF-8 by convention".

The comment I link says:

> Treating paths as UTF-8 works very well

Which is either

1. wrong

or

2. nonsensical, given the later statement that you should not "require your UTF-8 strings to be valid", which would make them not UTF-8

> If you're concatenating two strings that are both invalid UTF-8, there's not much you can do that's better than just concatenating the bytes together... which is exactly what treating them as byte arrays would end up doing

But that's the point innit? You're asserting semantics which don't hold and which you break with no regard.

> (but it's less convenient).

Is it now? Here's the concatenation of two strings:

    a + b
here's the concatenation of two byte arrays:

    a + b
You're right, the inconvenience makes me shudder. What horror. What indignity.