Hacker News new | ask | show | jobs
by blep_ 1288 days ago
If it's that trivial to find them (with code, not eyeballs), it's also trivial to remove them.
2 comments

I assume most people would not be that sophisticated, but I understand the point. This seems like it would be an ongoing battle no matter what OpenAI does though.
or paste into notepad.exe, copy back into whatever you were using.

Voila!

That will almost certainly preserve the invisible characters. Most invisible characters are used for some kind of in-line formatting in Unicode, so it's not desirable to remove them.
What inline formatting in notepad.exe? It doesn't even support bolding/italics/underling.

But I guess there are tabs and line return/carriage returns, so there's that.

Right-to-left/left-to-right markers. Language tags. Various invisible spaces. Homoglyphs. (all trivially filterable though)
I've already got a script running every 2500 milliseconds to strip leading and trailing whitespace, HTML, and non-ASCII characters except for the UTF-8 characters of our local language.