Hacker News new | ask | show | jobs
by verdverm 1452 days ago
The problem I see with this is that every language would need to replicate the code & logic.

With data / config, the translations are recorded in one place and all consumers can get the update without code changes.

The big thing I've been wondering / looking for is a shared, open source translation database. Anyone have links?

3 comments

Since I am working on an open source localization solution (that makes localization of software effortless), having an open source "translation memory" database makes sense. I will keep this idea in my mind! :)
Context-less translation can be done quite successfully these days with online services. You could simply make a few hundred calls to something like Google Translate and get good quality translations in multiple languages.

This is built-in some of the top software translating platforms to "seed" the initial translation. A bulk kickstart that can optionally later be refined by human translators.

As someone in the localization business, let me assure you that, with the current state of the art, using machine translation without any kind of human post-editing for UI is a terrible idea.

That the UI is not in English does not mean that a non-English person will be able to understand it and use it successfully.

You can only do it if you do not have any kind of support for those international users and if those users are not your real customers but merely statistics in the usage dashboard of a free product.

Are there any off the shelf easy to use translation services where you just send in your xliff files or something and get it translated based on translation size/time taken by the translator? At my job, everything is pretty automated between translators and devs, but I imagine that's not very simple or easy for small devs/open source. I'm thinking xliff in particular would be the easiest format to work with since there are so many tools available for it.

Obviously, Google Translate is going to produce suboptimal translations, but it does have the big advantage of being easy to automate.

> The big thing I've been wondering / looking for is a shared, open source translation database. Anyone have links?

That's a neat idea. It'll be super useful for 80% of the cases, where context is that important. But for the rest of the 20%, context of where the translation will be used, is as important as the word itself. So you cannot always reuse the same translation in different contexts, as it'll sound unnatural then.

Still, if there was a easy solution for being able to change between different options for the translation, having a shared open source translation database for projects to use, would be very valuable and useful.

The (surmountable) problem is tree-shaking so you only include the translations you use
If I can manage to store all the data from HN comments and submissions in 99 GB (31993925 "items", in a very naive way), we should be able to have a DB with most common translations for most web apps way below that, closer to 1GB, if some clever people do it :)
I'm talking about when I ship my frontend, needs to be super minimal for CI to handle. Might make sense to have packages / modules