Hacker News new | ask | show | jobs
by lmm 4603 days ago
It's a big, ambitious update. The biggest difference is forcing users to distinguish between strings and byte sequences; essentially programs now have to be encoding-aware (at least if they use any of the standard library functions). Which is a Good Thing, but can require a ton of work for existing codebases.
2 comments

It's not that ambitious- none of the changes are particularly compelling, none of them scream "update now".

It does, on the other hand, break backwards compatibility. Which is why hardly anyone updated.

Maybe not in the world of ASCII, but the new Unicode system scream seems pretty loud to me.

When I decided to use pelican for a non-English blog, I thought it would be piece of cake; just changing the theme and plugging a calendar converter and I would be done with it. In reality, I had to fork pelican and the calendar library (which was not well-maintained) and bang my head to the wall for three days to make them work together, all because of the whole string/unicode seperation and the fact that things work automagically as long as you're just using ASCII.

Does this get easier or harder in python 3?

I like the explicit separation that Racket has between "here is a buffer of binary data" and "here is a sequence of Unicode characters," and (looking on the outside without working with it), I'm glad that Python 3 began to adopt some of that.

smnrchrds's case of fixing someone else's ASCII assumption gets easier, because the code probably would not have been written that way.

In Python 2, it's really easy to write code that confuses bytes and characters, which introduces bugs and crashes when non-ASCII characters show up.

In Python 3, they made it easier to work with Unicode, because it's the default for everything, and much harder to confuse bytes and characters, because of that separation between the data types.

ding ding ding!! It was not ambitious enough for breaking backward compatibility.
I've seen many programmers having trouble with this. But it's essential when using UTF-8, because sgtring presentations length might be is different from byte length. So byte != char != int (0-255). It's hard to get for some coders who are used that all of those datatyps are the same.