The largest translation effort in the world services 200 million users a month, translating the textual equivalent of 1 million books a day. Can you guess who that might be? With numbers like that, you know its gotta be Google.
Franz Och, Distinguished Research Scientist, (his title, not mine) at Google Translate marked six years of progress in a recent post at Google Translate Blog.
In just one day, Google translates as much as all the professional translators in the world translate in a year, says Google. A million books multiplied by 100,000 words per book is 10 billion words.
Only Google has the rack space to run this kind of volume, and the technology to do it efficiently. When they started out, the distinguished Och reports that it took 40 hours and 1000 machines to translate 1000 sentences, which for those of you lacking a technical background means that it sucked. “So we focused on speed, and a year later our system could translate a sentence in under a second, and with better quality. In early 2006, we rolled out our first languages: Chinese, then Arabic.” Now Google offers 62 languages total, and has a pretty good kit for the less commonly translated languages, so many more to come.
While those human translators billed billions for their billions of words, Google just gives the translations away, although it now costs money to license the app, at something like a penny or two per page, my best guess about .03 % the cost of human translation (Note: that’s 3/100 of a percent, not 3%)
And a tip of the hat (from me, not the distinguished Och), to all of you who spend your lives translating words for a few pennies each. Thanks to your for providing all the good, paid translation that Google then scoops up in order to turn into bad, free translation. As the distinguished Och notes, “we believe that as machine translation encourages people to speak their own languages more and carry on more global conversations, translation experts will be more crucial than ever.” Note that he said “translation experts” not “translators.”
But dollars and cents really don’t give a sense of what a transformative technology machine translation has become thanks to Google’s reach. Google Translate reports that 92% of queries originate outside the Unites States, and that use of Google Translate on mobiles is increasing at four times the rate of desk-bound systems. So Google Translate will become even more ubiquitous in the future than it is now.
Google has taken their current statistical approach as far as it can go, say I. For years statistical machine translation guys have been asking for more data. But Google Translate reached critical mass a long time ago, and even vast new stores of data have had only the most incremental impact on the quality of their machine translation.
Alexis Madrigal argues in Atlantic that Google is going to have to come up with some new tricks to make better use of the data they already have. “Google (or any other translation software) will have to start understanding (in some way) the semantic content of the words it is arranging.” Which strikes me as a kind a knuckle-headed comments that at least demonstrates that his blogging heart is in the right place.
Ilia Kaufman of NoBable was developing AI algorithms’ to identify domain (subject area) in text as a means to refine machine translation output. So that when the computer translates “the server is down” it will be translated one way for an IT text, i.e., “the computer has failed,” and another way for hospitality industry, as in, “the waiter is injured.” That’s where I would put my money. But NoBable went out of business last week.
You can see Google Translate in action against Bing and BabelFish at my Free Translation Challenge. I thought this page would be a big hit, but it’s not as easy to give away free translation as it used to be. Thanks to Google Translate.