Actually, I recognize that specific breakage (a-box), as I’ve had to deal with it in my game engine. The problem is that something is interpreting each byte of a utf-8 encoded string as a separate character. That’s why some bytes show up as á and others are boxes — á is one of the few non-English characters that’s still valid ascii (single byte characters).
The fix is to tell your framework to decode in utf-8 mode. I don’t use ruby, but in python it’s mode=‘utf-8’. In C++ it’s to convert to wstring, then operate on wchar_t.
Unicode problems are mysterious, but I find it quite gratifying to solve them. At least nowadays. I used to find them incredibly annoying. But it’s pretty cool seeing any language be rendered by your app.
The fix is to tell your framework to decode in utf-8 mode. I don’t use ruby, but in python it’s mode=‘utf-8’. In C++ it’s to convert to wstring, then operate on wchar_t.
Unicode problems are mysterious, but I find it quite gratifying to solve them. At least nowadays. I used to find them incredibly annoying. But it’s pretty cool seeing any language be rendered by your app.