| Thanks for the write-up, I deal with the same issue in our company and while we do work in Gettext with UTF-8 (that solves most basic issues just fine), it seems every project that does i18n is cooking it up in their own way and I have not been able to find many references online. I will probably make an article describing our setup when I get around to polishing it. The concensus around Transifex in #i18n@freenode seems to be that the open source version is old and not maintained and should not be used. The SaaS offering is much newer and packs quite a bit more features. The "good" open source offering appears to be Pootle [0]. Honestly, I would be very worried about depending on a cloud service such as Transifex for something that is so deeply embedded into our (pretty continuous) development process. This requires automation, and all the time invested in integrating with release processes and continuous integration can easily go overboard. Of course, if Transifex were seamlessly integrated with project management applications out of the box, then it wouldn't be such a risky proposition. ---- An interesting point about i18n that is quite independent from the tool selection is how you write your message identifiers. You can basically use labels (i.e, an ID for the string) or use the "original" string. Here's the tradeoff: if you use an ID, you must reference the application constantly to understand what the translation should say (and in any non trivial application, this is a huge burden for translators), and there is either no string reuse (because places with the same intended content have used different IDs), or the need for an anal curator to go around chastising developers ("the OK button should always be ACTION_BUTTON_LABEL_OK!! fix it!!"). On the other hand, if you use original strings in English you will find that you experience language collisions (two places where the original string in English is the same, but the translated one is not), so you end up resorting to introducing artificial differences to make them unique (i.e "Request (verb)" and "Request (substantive)" instead of just "Request"). A hack that goes a long way if your engineering team is based off a country that uses a latin language, is to use that instead of English for original strings. Latin languages are typically more complex than English so collisions are greatly reduced. Chances are your translation team is also based in that country as well, so no harm done. ---- If you are doing branchy development, I put together a wiki page [1] on the Mercurial wiki with a script I use to merge translation catalogs (.po) seamlessly when doing branch merges. It can easily be used with git as well. ---- Links [0] http://pootle.translatehouse.org/ [1] http://mercurial.selenic.com/wiki/MergeGettext |
> Here's the tradeoff: if you use an ID, you must reference the application constantly to understand what the translation should say (and in any non trivial application, this is a huge burden for translators), and there is either no string reuse (because places with the same intended content have used different IDs), or the need for an anal curator to go around chastising developers ("the OK button should always be ACTION_BUTTON_LABEL_OK!! fix it!!"). On the other hand, if you use original strings in English you will find that you experience language collisions (two places where the original string in English is the same, but the translated one is not), so you end up resorting to introducing artificial differences to make them unique (i.e "Request (verb)" and "Request (substantive)" instead of just "Request").
The PO format uses the field "context" to differentiate among the various uses of a word/phrase. You should also add a comment for your translators in this case.
Also, using an ID messes with the PO format itself. E.g., fallbacks in case of a missing translation will not work.
But there are other formats that are ID-based, like .properties in Java.