| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wycats 4195 days ago

I agree with the vast majority of what you've said.

One thing worth noting is that there was a TREMENDOUS effort that I headed up in the Rails 3 era to very aggressively attempt to reduce the number of encoding-related problems in Rails, and to make sure that common mistakes produced clear error messages.

I wrote two somewhat lengthy blog posts at the time[1][2] for a contemporary historical perspective just as the difficulty with encodings started to heat up.

One of the goals of the Rails 3 effort was to make significant efforts to ensure that strings that made their way into Rails came in as UTF-8. That involved being very careful with templates (I wrote a bit of a novel in the docs that remains to this day[3]), figuring out how to ensure that browser forms submitted their data in UTF-8 (even in IE6[4]), and working with Brian Lopez on mysql2 to ensure that all strings coming in from Postgres were properly tagged with encodings.

I also did a lot of person-to-person evangelism to try to get C bindings to respect the `default_internal` encoding setting, which Rails sets to UTF-8.

The net effect of all of that work is that while people experienced a certain amount of encoding-related issues in Rails 3, it was dramatically smaller than the kinds of errors we were seeing when experimental Ruby 1.9 support was first added to Rails 2.3.

---

P.S. I completely agree that the ASCII-7 exception was critical to keeping things rolling in the early days, but I personally would have liked an opt-in setting that would raise an exception when concatenating BINARY that happened to contain ASCII-7-only bytes with an ASCII-compatible string. In practice, this exception allowed a number of obscure C bindings to continue to produce BINARY strings well into the encoding era, and they were responsible for a large percentage (in my experience) of weird production-only bugs.

Specifically, you would have development and test environments that only tested with ASCII characters (people's names, for example). Then, in production, the occasional user would type in something like "José", producing a hard-to-reproduce encoding compatibility exception. This kind of problem is essentially eliminated with libraries that are encoding-aware at the C boundary that respect `default_internal`.

[1]: http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer...

[2]: http://yehudakatz.com/2010/05/17/encodings-unabridged/

[3]: https://github.com/rails/rails/blob/master/actionview/lib/ac...

[4]: http://stackoverflow.com/a/3348524