|
|
|
|
|
by jameshart
4708 days ago
|
|
That's not called 'sanitizing', it's called 'escaping' and 'encoding'. The byte sequence I need to store to communicate the name "Kei$ha O'Shaughnessey, Jr." in a UTF-8 JSON string literal, a UTF-8 HTML attribute, a UTF-16 bigendian CSV file, or an ISO-8859 SQL parameter, are going to be different - but so long as all the characters I need to pass are representable in all of those domains all I have to do is perform the correct escaping and encoding. At no point do I need to 'sanitize' the name. It's a name, it's not dirty. If there are characters there that I can't represent in the target domain, then I need to handle the loss of information. |
|
For instance, it may be perfectly fine in my system to have a user named '<script>alert("ha!")</script>'. Are you sure that's okay in your PHP-based web forum? Really sure? Every place they've ever shown a username to the user, it's well-escaped?
And even if that's true today, what about the day when someone decides to change the web forum software to something else? What about the day when someone turns on a feature that copies certain forum threads to an internal support system, also provided by a third party?