| HN Mirror

If you want an error, throw an error. We specified that up front. The entire sub-thread you're in is about what happens if you for whatever reason can't throw an error or don't want to.

Notice that, again if you want an error you can detect U+FFFD and error out on that. I mean apparently this isn't Unicode after all right? So the only way U+FFFD got into the pipeline is because of an error you've now decided you should have caught but... didn't?

Your approach randomly introduces unspecified behaviour which is likely to introduce security vulnerabilities and who knows what other problems because it resists "Full Recognition Before Processing".

Unlike treating text in unknown encoding as UTF-8, passing it through mangled by tools that didn't actually understand it as you've proposed does lead to real world vulnerabilities that can be as serious as remote code execution.