Hacker News new | ask | show | jobs
by JulianWasTaken 4772 days ago
> There is also an advice for encoding __repr__ and __str__ results to utf8 under Python 2.x in the article; this is fine (other approaches are not better), but it has some non-obvious consequences (like breaking REPL in some setups) that developers should be aware of, see http://kmike.ru/python-with-strings-attached/

I don't see `__repr__` mentioned there, but `__repr__` should basically always be ascii (which a quick glance at your article looks like it mentions).

I'm fine with `__str__` returning (encoding to) `utf-8` generally, as if someone wants something else they can always encode the unicode themselves to what they want, but `.encode(locale.getpreferredencoding())` is also fine with me if you want to be even more polite.

1 comments

You're right that __repr__ was not mentioned, my bad.

I think `.encode(locale.getpreferredencoding())` is awful because this changes string encoding from run to run, and because locale.getpreferredencoding() could be different (and is different by default e.g. in Cyrillic Windows XP) from both `sys.stdout.encoding` (used for printing) and `sys.getdefaultencoding()` (used for implicit type conversions).

Good point. Honestly I'm careful about calling str on random objects which I know are doing this. But yeah, I guess that's probably a good enough reason to pick an encoding and go with it, which `utf-8` is a good of a choice as any.