The Best Online Machine Translation

by Translation Guy on November 2, 2011
18 comments

The word-witch-doctors over at Bab.la have cooked up a a big machine translation showdown to test which is the best — Prompt, Systrans, Google or Bing.

The challenge: 500 sentences in 10 languages per pair, English to French, German, Spanish, Italian, Portuguese, two-way. Not back translated, but different translations going in either direction.

Results were scored from zero to 3. Zero for incomprehensible, 1 for educated guess, 2 for good gist, bad grammar, and 3 for good ‘nuff. (I renamed these categories, as bab.la’s were pretty clunky, and I’m thinking of using this re-named system myself.)

Each translation batch had 5 sentences from 10 different domains or subject areas: advertising, business, finance, food, law, literature, medicine, religion, slang and of course Tweets.

And the winner is… Google! Google, Bing places, Systran shows and Prompt bringing up the rear. Well, like they say, “the guy with the biggest servers wins.”

Look at the bar graph. Those bars make it look all sliced and diced, scientific like. But we aren’t there yet. First off, why is the Spanish machine translation (MT) half as good as the other Romance languages? Seems like a pretty big deviation for four linguistic variants of what is basically Latin. Does a small sample size mean that a single reviewer attached at the hip to the Academia Real Spanish dictionary can skew the odds that much? What it does confirm is that the categories used by bab.la are subjective, nothing wrong with that, but the criteria testers used in the bab.la study are probably quite different than the criteria actually used by users of MT surfing online.

Next question, why FIGS? Things start to get interesting in MT once you leave Western Europe behind. East Asian machine translation quality is a critical problem, and results can be opaque, since bilingualism between English and these languages is much lower than among Euro–languages, so translation problems are harder for users to detect. Certainly the common Western European languages are transparent to far more users, so bab.la’s evaluation of other language pairs would be more enlightening, maybe, but, I’m not sure that bab.la’s testing is really all that relevant to how machine translation is actually used on the Web.

Because I’m not sure either how MT is actually used on the Web. Even after selling and giving it away machine translation for over a decade, I still have not figured it out, which I guess says more about me than about the quality of bab.la’s testing.

Great to see bab.la’s work on this, because right now it is front-burner for me. Because, it just so happens that we are bringing back a free translation service to 1-800-Translate.com.

Years ago, we did a free translation feature on the website, and we still have a lot of incoming traffic from people looking for that old page from legacy links that are still out there.) So to respond to that demand for machine translation, we’ve looked at a lot of different systems and also worked up a few versions on our own for user testing. It’s been very interesting, because we’ve found some clues that suggest everything you know about machine translation is wrong.

Whoops. Sorry, I misspoke. I mean everything we (at 1-800-Translate) know about MT is wrong. For one thing, we think monoglots are very unlikely to use online MT. Successful translation is like crack. If it’s good, once you start, you won’t stop. This because translation via MT, as in any communication, is like a tennis volley. If the MT tool keeps dropping the ball on your message, there will be no answer back, so no volley. Only successful users are repeat users.

We think the most active users are bilinguals using MT to speed up or improve their bilingual communication efforts, because they have the linguistic resources available to handle the high rate of error. Also, people are using these tools for the oddest reasons and in the oddest ways, but that’s another post.

The most interesting thing about machine translation is the nature of translation error. Even the best MT is often wrong, which means that some sentences do better in one tool than another, depending not on the tool, but on the sentence under translation. So results from the engines vary widely, and even an engine that often produces superior results will not do so a good part of the time. To us it looks as if the problems with machine translation accuracy are not related to deficiencies in the actual software, but are caused by a user interface that stops users from getting to the best translation.

So, in order to learn more about these issues, we are going to go live soon with a new iteration of our free translation page, this time called the Free Translation Challenge, which will allow users to share their MT experience with other users. It’s an attempt to look at machine translation quality from an ISO 9001 perspective, which requires that quality is defined by the customer, and no one else. No panels of experts need apply. Just us chickens, or should I say, just you chickens. Cluck, cluck.

18 Comments

  1. Douglas Zhao says:

    If the experiment continued with, say, 5000 sentences, would the results still be the same? Was the sample too small?

    • Ken says:

      Too small sample perhaps, but I’d be surprised if the results weren’t the same with a larger sample.

  2. Ken, is that you arm wrestling the robot? Great guns!

    • Ken says:

      Good eye, Kristen. My build to a T, but for insurance purposes we had to use my stunt double.

  3. Heather Lamb says:

    Looking forward to your freebie. Can’t wait to try it.

    • Ken says:

      Me too. Webmaster and I had a rather animated discussion on just that subject a little while ago. I’ll keep you posted.

  4. Maisie says:

    I like how you renamed them…or shall I say translated them. Easily understood.

  5. Hugh Wise says:

    I am not surprised Google won. If you got the money you can have the best product.

  6. The comment on the Spanish translation being half as good caught my eye. I wonder why that is.

  7. ConnieLingus says:

    Experiment looks good. Many things were considered and it just goes to show that (just how you put it) the bigger server wins.

  8. Yvonne Nixon says:

    Nice analogy with the tennis volley. And you are right, if I didn’t get a good MT translation, I wouldn’t go back to it either.

  9. Kind of sad that the MT engines vary so much. I guess you just cross your fingers and hope for the best if that’s the way you choose to go.

  10. Marc Coley says:

    I can see how the east Asian quality could be a problem. Languages like Chinese are tonal and depending how the word is pronounced can have a dramatically different meaning.

  11. Joyce Kaplan says:

    So the MT software isn’t to blame, but the users are?

    • Ken says:

      Always.

  12. DangerRoss says:

    I would love to share my MT experience with you! Just tell me when your free sit is up and running.

  13. I’m not surprised Google won, but by how much? Was it overwhelmingly? Did the other MT just get zero’s and one’s?

  14. When we ran a similar experiment for Greek using SYSTRAN, Bing Translator and Google, Google came also on top but I don’t think it is to be attributed to “may the best server win”. I think it has to do with the ability of the user to change the translation should they realize that it is wrong. This correction, as far as we learned from reading similar articles, is kept so that in the future it can be avoided. So despite the fact that Google itself is not really interactive it does let the end user in every so slightly. You can read a summary presentation (Machnine translation for under resourced languages) of our findings for Greek.

    We basically took 200 words texts from English and translated them into Greek because we knew the MT services are really lacking on that direction given the rich morphology of Greek.

LiveZilla Live Chat Software