|
|
|
|
|
by matthewmacleod
4195 days ago
|
|
What situation is that, out of interest? As of Ruby 1.9, there's a pretty sensible solution to this in the language, and I haven't had any encoding problems in some time. I appreciate I might have missed something though! |
|
I assume that with "Ruby 1.9 solution", you refer to the fact that Ruby source code is by default evaluated as UTF-8, right?
That's definitely a good thing, but with Python3 that wasn't the only change brought into the language.
I said "if Ruby ever decides to fix", because the need for a change is not obvious and not universally accepted: it's basically the same issue as automatic type coercion (aka weak/strong typing) and early (or late) raise/throw/handling of exception.
Basically: In Python2 and Ruby you have one or two String types (runtime types, in this discussion I only care about them), with the Ruby Strings tagged with the encoding internally used. In python the types are just "anything goes" (binary strings, the old Python2 string) and unicode (the actual internal encoding is an implementation detail).
The problem (if you agree that it is one) is that you can easily mix-and-match them, and everything will work fine only as long as the operation makes sense. When it won't anymore you'll get an exception.
This is a problem when you don't completely control the type/encoding of your input (e.g. if you have an HTTP request and your string depends on the type/charset specified in the Content-Type).
A dumb example of what could happen:
A similar thing can happen in Python2. While Python3 will reject the same operation as soon as the types get in contact with each other (still at runtime, but it'll be like doing `1+"1"` in Python or Ruby: you'll spot it right away).I wrote a quite lengthy blog post about this change in Python3, but I haven't translated it in English yet, if there's some interest I could try to do it a bit sooner.
Anyhow, I don't want to create a flame or anything like that. I just wanted to explain why the Python3 choice has been made, and why a destructive change might have had its merit. While I prefer the Python3's approach, and I'm definitely not a Ruby developer, I still appreciate these updates to Ruby: for example I actually touched first hand the internal encoding-handling code of Ruby (Rubinius) some time ago with a friend of mine: http://git.io/7kM4Gw and I can benefit from the new GC code in new rubies, which makes metasploit 4 times faster to load.