|
|
|
|
|
by luizfelberti
1068 days ago
|
|
There are also other concerns depending on your threat model: if you're parsing user-generated strings you definitely want to be able to handle corrupted unicode, for security reasons, and in these scenarios the way you handle recovery if you choose to do so may aggravate exploitation. Consider a corrupted codepoint at the end of a user generated string: will it recognize the closing quote as such, or will it assume it is part of a corrupted codepoint and try to skip over it? So many ways to shoot yourself in the foot by "abstracting away" the formal semantics of your inputs, I think it's pretty much never worth it. (An interesting search term here is LangSec) |
|
Maybe I'm misunderstanding you, but because of how UTF-8 is a superset of ASCII, I don't believe you can misrecognize ASCII characters if that's what you mean.