Hacker News new | ask | show | jobs
by tenfingers 3825 days ago
Can you clarify the last point? I do not get it. Can you make an example of where an extra indirect mechanism (besides interpolation) is _required_ for translation?

I generally localize for western languages and write in many programming languages. The combination of gettext + python format strings has been working really great for me, and generally much better than other systems I've seen and put to use. In fact, the simplicity of gettext provides a very fast turn-around, and with translators experienced with the tool I never had problems. Python format strings also work great in this context, as I can supply an arbitrary dictionary of elements that the translator might need. The only real problem has been plural forms in complex text strings, where ngettext is not always sufficient.

What method (name a project I can inspect) do you recommend as a good localization architecture?

1 comments

You're looking for something like:

     l8n_context.format('file_not_found', file)

'file_not_found' is just an identifier to lookup the actual format template (likely a format string) that will be combined with the file object to render the error message.
There's not much difference. In fact, you cannot expect translators to know how the underlying object is handled or write data extractor out of the file object.

If translators are cooperating with you, it's very easy to provide the needed elements directly in the format's dictionary (that is: you extract the translatable pieces for them). It also means you don't have to worry that they're going to fiddle with mutable state.

I generally write everything in english, and do back-translation to my own locale (I also cooperate for translating external projects into my locale), so I eat my own dogfood here.

I know I do not want to deal with extra lower-level subtleties here. Translation is hard already by itself. It's impressive how a good translation of a simple UI can take so much time. If I had to inspect the object to know what I can get out of it I would get crazy.

I'd take a pre-baked dictionary any time.

I've also already used the string-catalog approach in the past (heh, XUL), and I'd personally take gettext any day.

But do you see how that extra level of indirection basically maks arguments about the different formatting styles for localization kind of irrelevant?
I do, yes. But the translator needs to be aware of the extra indirection to produce the text he needs, and between a custom layer and a standard formatting syntax, the second is definitely friendlier for anyone approaching translation, even when technically less powerful.

At some point the translator will have to format some string himself.

There's no reason why your "ID" can't be the template for your default language.
OK. How is that different from using functions? Conceptually, there's a function that does arbitrary computation under the hood.

(I think we are already agreeing, only that you express your point differently in Python specific terms. I don't care whether you stuff your code into a context object or something else. I was talking about having the full power of the programming language available, vs using a limited language like format strings.)

The important part is you have one more level of indirection before you figure out the layout of the message.