Hacker News new | ask | show | jobs
by seles 1579 days ago
According to the article, PHP can handle other encodings by just treating sequences of strings as byte sequences and not caring what the encoding is. There example:

$string = "漢字";

But if you are using say UTF-8 and one of those Chinese characters has one of its bytes have a value of 34 (the ascii value of "), then wouldn't the string terminate prematurely?

Edit: to answer my own question, quote from wikipedia: ASCII bytes do not occur when encoding non-ASCII code points into UTF-8

1 comments

Also, the compiler might be treating the input file as UTF-8, while the semantics of the language may treat string literals as the sequence of bytes when encoded as UTF-8.