Hacker News new | ask | show | jobs
by ondrasej 5150 days ago
I wouldn't argue against using a formatting string in general, but the printf style order-based formatting strings are really bad once you need to localize your app and have to support languages with different word order. In this sense, .NET or Python3-like formatting strings with position-based formatting is orders of magnitude better and I was surprised that Sun decided to use %* instead.

Take a simple example in the lines of "I've seen {0} {1} {2}".format("John", "eat", "an apple") ...and try localizing this message into e.g. German.

I have a feeling that printf-style is one of the reasons, why the texts in localized versions of some programs are as bad as they are.

2 comments

glibc sprintf supports that as something like %2$s %1$s which Python could potentially support.

I don't buy the internationalization argument however. You have to explicitly select strings for i18n anyway and run them through something that retrieves the data from a catalogue based on the current language. It's not like you will override str.__mod__ to start i18n.

And if you want to do it right, there are more complex rules that need a special formatting language. For example if you want to display {count} error/errors, then the text of error/errors will vary based on language not just based on count 0/1/more but even more wildly based on language. E.g. some have a special noun declension for 2. For example in Polish, that might be when there are 5 errors: http://blogs.transparent.com/polish/cardinal-numbers/

Python supports `"%(name)s"`, which means you're not dependent on order at all -- in my opinion, much nicer than indexing into the parameter list:

  >>> "%(name)s has %(count)d %(thing)s" % {'thing': 'bananas', 'count': 10, 'name': "Phil"}
  'Phil has 10 bananas'
The new-style formatting allows that as well:

    >>> "{name} has {count} {thing}".format(
          thing='bananas', count=10, name='Phil')
    'Phil has 10 bananas'
It also allows for more complex expressions inside of format specifiers, so if you need to grab data out of an object, you can do something like:

    >>> from collections import namedtuple
    >>> Sentence = namedtuple('Sentence', ['name', 'thing', 'count'])
    >>> s = Sentence(name='Phil', thing='bananas', count=10)
    >>> '{dat[0]} has {dat.count} {dat.thing}'.format(dat=s)
    'Phil has 10 bananas'
which is verbose and contrived here, but could be immensely useful in cases like

    >>> '... {numeral[5]} {item.plural}'.format(
          numeral=fr_numerals, item=animal)
    '... cinq animaux'
How do you handle it when you need to convert, e.g., "American woman" to "femme américaine"?
You use something else. It's still just a format string, not a whole localization solution. Named arguments means it might be easier to hack together something like

    templates = {'en': '{adj} {noun}', 'fr': '{noun} {adj}'}
    print(templates['fr'].format(noun='fromage', adj='délicieux')
but that's still a hack, and doesn't even come close to addressing cases like Chinese's "{adj} {counter_word[noun]} {noun}" or gender concord or any of the myriad other things you come across in practice.

Edit: used 'positional' instead of 'named'.

I found that I often (~1 time in 10) make a mistake with that syntax, and omit the trailing 's'. I think it's because I'm not used to having anything after a closing ')'.
Same problem here. Maybe the braces approach using {name} is an improvement upon %(name)s in that respect.
Do you really want to keep the same format strings when localizing an application? Isn't it much cleaner to just define multiple format strings? Translations in general don't work that way, you can't just move around words and translate them individually, usually the whole expression changes.
That's the point. You can't change the format strings to a different word order if the selection of values used to fill the fields is entirely dependent on word order (i.e., how the basic printf syntax works), unless you have a separate line/block of code that executes when the word order of the current locale differs.
I see your point now. Yes, I can see Python 3's syntax facilitate some of that work. I still think you will need branches and/or polymorphisms to do more complex localizations, especially when you need to support east asian languages.
We just have a database of sentences and a translation is a lookup in the DB by some unique identifier. Trying to semi-translate a string just doesn't work very well in the long run (for us).