Hacker News new | ask | show | jobs
by lelanthran 1101 days ago
> 5 bytes? In what encoding?

I believe UTF-8 reserved up to six bytes for a single character.

1 comments

Yes, UTF-8 was up to 6 bytes per character early on. Some broken implementations like MySQL's limit it to up to 3 bytes per character. The actual number is 4.

So what is "up to 5"?

I think with decomposed Hangul, you can end up with 6 or more bytes per character, due to each part of it being two bytes, and 2-4(?) parts per character.