| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ondrasej 5197 days ago

I wouldn't argue against using a formatting string in general, but the printf style order-based formatting strings are really bad once you need to localize your app and have to support languages with different word order. In this sense, .NET or Python3-like formatting strings with position-based formatting is orders of magnitude better and I was surprised that Sun decided to use %* instead.

Take a simple example in the lines of "I've seen {0} {1} {2}".format("John", "eat", "an apple") ...and try localizing this message into e.g. German.

I have a feeling that printf-style is one of the reasons, why the texts in localized versions of some programs are as bad as they are.

2 comments

Erwin 5197 days ago

glibc sprintf supports that as something like %2$s %1$s which Python could potentially support.

I don't buy the internationalization argument however. You have to explicitly select strings for i18n anyway and run them through something that retrieves the data from a catalogue based on the current language. It's not like you will override str.__mod__ to start i18n.

And if you want to do it right, there are more complex rules that need a special formatting language. For example if you want to display {count} error/errors, then the text of error/errors will vary based on language not just based on count 0/1/more but even more wildly based on language. E.g. some have a special noun declension for 2. For example in Polish, that might be when there are 5 errors: http://blogs.transparent.com/polish/cardinal-numbers/

link

andrewaylett 5197 days ago

Python supports `"%(name)s"`, which means you're not dependent on order at all -- in my opinion, much nicer than indexing into the parameter list:

  >>> "%(name)s has %(count)d %(thing)s" % {'thing': 'bananas', 'count': 10, 'name': "Phil"}
  'Phil has 10 bananas'

link

andolanra 5197 days ago

The new-style formatting allows that as well:

    >>> "{name} has {count} {thing}".format(
          thing='bananas', count=10, name='Phil')
    'Phil has 10 bananas'

It also allows for more complex expressions inside of format specifiers, so if you need to grab data out of an object, you can do something like:

    >>> from collections import namedtuple
    >>> Sentence = namedtuple('Sentence', ['name', 'thing', 'count'])
    >>> s = Sentence(name='Phil', thing='bananas', count=10)
    >>> '{dat[0]} has {dat.count} {dat.thing}'.format(dat=s)
    'Phil has 10 bananas'

which is verbose and contrived here, but could be immensely useful in cases like

    >>> '... {numeral[5]} {item.plural}'.format(
          numeral=fr_numerals, item=animal)
    '... cinq animaux'

link

286c8cb04bda 5197 days ago

How do you handle it when you need to convert, e.g., "American woman" to "femme américaine"?

link

andolanra 5197 days ago

You use something else. It's still just a format string, not a whole localization solution. Named arguments means it might be easier to hack together something like

    templates = {'en': '{adj} {noun}', 'fr': '{noun} {adj}'}
    print(templates['fr'].format(noun='fromage', adj='délicieux')

but that's still a hack, and doesn't even come close to addressing cases like Chinese's "{adj} {counter_word[noun]} {noun}" or gender concord or any of the myriad other things you come across in practice.

Edit: used 'positional' instead of 'named'.

link

dalke 5197 days ago

I found that I often (~1 time in 10) make a mistake with that syntax, and omit the trailing 's'. I think it's because I'm not used to having anything after a closing ')'.

link

mturmon 5197 days ago

Same problem here. Maybe the braces approach using {name} is an improvement upon %(name)s in that respect.

link

DeepDuh 5197 days ago

Do you really want to keep the same format strings when localizing an application? Isn't it much cleaner to just define multiple format strings? Translations in general don't work that way, you can't just move around words and translate them individually, usually the whole expression changes.

link

psquid 5197 days ago

That's the point. You can't change the format strings to a different word order if the selection of values used to fill the fields is entirely dependent on word order (i.e., how the basic printf syntax works), unless you have a separate line/block of code that executes when the word order of the current locale differs.

link

DeepDuh 5197 days ago

I see your point now. Yes, I can see Python 3's syntax facilitate some of that work. I still think you will need branches and/or polymorphisms to do more complex localizations, especially when you need to support east asian languages.

link

sausagefeet 5197 days ago

We just have a database of sentences and a translation is a lookup in the DB by some unique identifier. Trying to semi-translate a string just doesn't work very well in the long run (for us).

link