Hacker News new | ask | show | jobs
by ttwwmm 2289 days ago
The core problem with strings in Python 2 was implicit coercion: you could mix `b''` and `u''` strings pretty freely as long as they only contained ASCII and they would be silently converted as required. Once you leave the ASCII range you start to see data-dependent bugs.

To gracefully deprecate this behavior, you could start by generating a warning each time an implicit coercion is done. Next, make implicit coercion raise an exception, but provide a way to suppress it. Finally, remove the ability to suppress the exception.

As the GP suggests, you'd do well to similarly deprecate unprefixed strings.

This would all be pretty confusing to explain if you were renaming the string types at the same time, as Python 3 did. I think that's an indication that you shouldn't rename the types. You could deprecate `str` and just use the names `bytes` and `unicode`, which go nicely with the `b''` and `u''` mnemonics anyway.

Python 3 also changed the type of string used for Python identifiers. You'd need a strategy there as well.

It might be convenient to have some type-checking `dict` variants in the stdlib, but I think it's a separate issue from addressing the coercion issue.