I assume most people would not be that sophisticated, but I understand the point. This seems like it would be an ongoing battle no matter what OpenAI does though.
That will almost certainly preserve the invisible characters. Most invisible characters are used for some kind of in-line formatting in Unicode, so it's not desirable to remove them.
I've already got a script running every 2500 milliseconds to strip leading and trailing whitespace, HTML, and non-ASCII characters except for the UTF-8 characters of our local language.