Hacker News new | ask | show | jobs
by josteink 3214 days ago
> The library just has to treat it as a string and not worry about the encoding (i.e. not try to encode it to/from the unicode type).

If I pass a library a string it receives a Unicode string, bytes already decoded using an encoding. It shouldn't be able to re-decode that in any way, whatever that is supposed to mean on a technical level.

If a library receives a byte-array representing text, that is a completely different matter and talking about encodings is fully appropriate, even required.

But this matter should predominantly exist at your application's barrier, when doing IO.

If you're regularly doing encoding and decoding anywhere else, you're doing something wrong (or your language is).

1 comments

Look back a few posts. We're discussing using UTF-8 in str and avoiding the unicode type in Python 2.

I'n my use case I validate the string as UTF-8 from the internet. To and from the database is UTF-8 so no validation is required there. Output back to the internet requires no additional steps.

Nowhere in this method is encode or decode required or desired.