Hacker News new | ask | show | jobs
by samhw 1598 days ago
> In software, I would use an unicode string internally, then when writing out I would encoded that to utf-8.

I don't understand what distinction you're drawing here? UTF-8 is Unicode. In what way would you be modifying it at the presentation layer? (Unless you're dealing with true UI code, and are saying "I would map the characters to font glyphs according to the UTF-8 standard".)

I know UTF-8 isn't the only way of encoding Unicode codepoints, for what it's worth. I'm just struggling to see how you would be using just 'Unicode', as opposed to a particular encoding, at the storage layer. It's still just bits and bytes.

1 comments

I think it would be more accurate to say that UTF-8 is a Unicode Transformation Format which by its name is logically distinct from Unicode itself. There are good reasons to store and process Unicode in UTF-8 format internally in many cases, but UTF-32 / UCS-4 would probably take over for internal processing if it weren't for memory usage and efficiency issues.