Hacker News new | ask | show | jobs
by electrotype 2038 days ago
I haven't touched PHP for a long time now. How is the Unicode support now? I remember having to use some special utilities to handle characters like "œ" properly.
1 comments

You cannot use the "basic" string manipulation functions (strcmp, strlen, etc.) because these are not unicode-aware.

However, you have the multibyte string functions family that can operate in a wide range of encodings (including UTF-8 which is the default in any sane installation nowadays).

[1] https://www.php.net/manual/en/ref.mbstring.php

I think I had issues even with mbstring, for some characters like "œ". But maybe I'm wrong.
œ works fine with mb_strlen(). What might have been tripping you up is combining character sequences:

https://3v4l.org/DM4pC

Handling those "correctly" with a string length function gets complicated in any language, as there isn't a 1-to-1 mapping between Unicode codepoints and visible glyphs.

In PHP grapheme_strlen achieves what you're describing: https://3v4l.org/HPOb3
Yes, I think you nailed what my issue was.