| (I'll try to keep this short, since I feel this is quite offtopic, if we want to discuss this further I suppose we could find a better venue... maybe even email?) I assume that with "Ruby 1.9 solution", you refer to the fact that Ruby source code is by default evaluated as UTF-8, right? That's definitely a good thing, but with Python3 that wasn't the only change brought into the language. I said "if Ruby ever decides to fix", because the need for a change is not obvious and not universally accepted: it's basically the same issue as automatic type coercion (aka weak/strong typing) and early (or late) raise/throw/handling of exception. Basically: In Python2 and Ruby you have one or two String types (runtime types, in this discussion I only care about them), with the Ruby Strings tagged with the encoding internally used. In python the types are just "anything goes" (binary strings, the old Python2 string) and unicode (the actual internal encoding is an implementation detail). The problem (if you agree that it is one) is that you can easily mix-and-match them, and everything will work fine only as long as the operation makes sense. When it won't anymore you'll get an exception. This is a problem when you don't completely control the type/encoding of your input (e.g. if you have an HTTP request and your string depends on the type/charset specified in the Content-Type). A dumb example of what could happen: a = "Ё".force_encoding "ISO-8859-5"
a2 = "®".force_encoding "ISO-8859-1"
# a + a2 will fail with Encoding::CompatibilityError
A similar thing can happen in Python2. While Python3 will reject the same operation as soon as the types get in contact with each other (still at runtime, but it'll be like doing `1+"1"` in Python or Ruby: you'll spot it right away).I wrote a quite lengthy blog post about this change in Python3, but I haven't translated it in English yet, if there's some interest I could try to do it a bit sooner. Anyhow, I don't want to create a flame or anything like that. I just wanted to explain why the Python3 choice has been made, and why a destructive change might have had its merit.
While I prefer the Python3's approach, and I'm definitely not a Ruby developer, I still appreciate these updates to Ruby: for example I actually touched first hand the internal encoding-handling code of Ruby (Rubinius) some time ago with a friend of mine: http://git.io/7kM4Gw and I can benefit from the new GC code in new rubies, which makes metasploit 4 times faster to load. |
In contrast, Python 3 changed the meaning of "foo". It also supported only u"foo" in Python 2 (to opt-in to unicode strings) and only b"foo" in Python 3 (to opt-in to byte-strings) for a fairly long period of time, making it extremely, extremely awkward (at best) to write a program with shims as abstractions that let most of the program remain oblivious to the differences.
Python 3.3 and 2.7 finally landed a lot of fixes to this kind of problem, but it landed fairly late, and after most of the community got a sense of the relative difficulty level of a transition to Python 3 that maintained support for Python 2 at the same time.
Both Ruby and JavaScript have taught me the value of a transition path to a new version that allows people to write libraries that support both the old and new version at the same time. Communities move a little at a time, especially long-term production projects. The best way to move them is via libraries that can serve as a bridge and target both the old and new version together.