Hacker News new | ask | show | jobs
by treve 3647 days ago
This is a pretty well-known problem, but mostly irrelevant if you use UTF-8.
3 comments

Or, you know, just doing the right thing(tm) and using parameterized queries
Yes. If you are building SQL by concatenating user inputs (escaped or not) you are doing it wrong.
IMO building SQL by concatenating anything feels wrong.

I still do it, and i haven't used an ORM yet that is actually useful, but it still feels wrong.

Section 3.1 of Unicode technical report 36 describes a couple of similar things specific to UTF-8: overlong sequences, and ill-formed subsequences.

Does your software conform to Unicode 3.0 and earlier, 3.1 through 5.1, or 5.2 and later? And do your server and client software agree? If you don't know the answer (and you depend on string sanitization), you may be at risk.

What is particularly fun is the history of GB2312/GBK/GB18030. I wonder how easy it would have been to change Win95 to use UTF-8 for example.