Hacker News new | ask | show | jobs
Unbabel (YC W14) Launches A Human-Edited Machine Translation Service (techcrunch.com)
45 points by vasco_ 4465 days ago
5 comments

Being able to try it for free is pretty cool. I just took a snippet from a foreign news article to try it out and the process was easy.

Have you thought at all about making a more B2C product? Obviously there's less money there, but I often come across articles in Korean that I can read but have a hard time understanding that I'd love translated (and not by Google translate). I could imagine having a few bucks in my Unbabel account, and any time I come across an article I really want to read, clicking a Chrome extension/bookmarklet that sends the article to Unbabel and deducts my account balance. Then I can just read at a later time. I guess this could be used for B2B as well but not sure how useful it'd be in this context.

Also, any timeline for other languages?

edit: Just realized you have an API, maybe I'll build it :)

Hello,

We thought about having such a widget, its on our roadmap. But not really sure when we will do it. There are a lot of cool integrations we want to make. Meanwhile if you decide to build it using the API let me know, I will super happy to help you with the API.

Joao (I'm the CTO)

The grammatical error on their infographic does not bode well for the quality of their translations

http://tctechcrunch2011.files.wordpress.com/2014/03/screen-s...

I can see two errors:

* “2. Its made into micro-tasks” that should read “2. It’s made into micro-tasks” Wouldn’t that mistake be so common and commonly decried on-line, it could be more excusable.

* Not sure what language is “Olá!” but if it’s Spanish it needs to be “¡Olá!”

Not sure about the over-all tone, either.

"Olá" is "Hello" in portuguese, in Spanish it's "Hola"
I am curious which part is translated automatically, and which is human-edited: current translations algorithms have approximation of how likely the outcome is, but I doubt it can translate to a quality metric that allows to filter efficiently a translated text between proper and improper translation. Maybe there are layers in the AmTurk treatment.

In the same vein, but further along: the Amazon Turk part of the service isn't visible on the website. I’m assuming if one wants to make money, they would have to connect to the Amazon service directly -- but then: why would TechCrunch mention that this is the innovation? Same question: AmzTurk doesn’t have great language coverage, will they need to develop their own to cover Finnish and Polish? I wonder if a competitor can expect to extract confidential information that way, or more simply disrupt a service.

You can think about our process as a chain. A text is original machine translated, then passed to an editor. When the editor is done we pass its output to another editor. The process continues until we are confident the quality is good.

We don't use Amazon Turk. We have our own community which works on our site or on our mobile apps. This gives our editors a much better experience. We are dedicated to improve our edition interface to simplify the work of the editors.

João

Thanks for that!

Out of curiosity: let’s say I’d like to join (I have free time, lack of motivation to do something significant, and I speak four languages fluently) how would I do that?

Just go to our website and join as an editor www.unbabel.com/editor
The first startup I ever worked for was something similar. We connected websites directly to a service that created automatic translations for language versions within a minute from posting new content on the page, and then it would schedule a machine-aided translation by an experienced translator. Unfortunately, none of the translators nor computational linguistics geeks knew how to sell it well.

I was able to dig up remains of our website http://web.archive.org/web/20100225094103/http://globalizato...

This isn't mentioned in the description but do they feedback edited text into the machine translator?
Hi, I am Joao the CTO of Unbabel. That is definitely something that is on our roadmap while we improve our mt systems. We have very useful data to build better mt systems and hence improve the overall system.