Translation Guy Blog
Google Translate is all over. Sorry to have to tell you. Well, not really. TranslationGuy loves to lower his voice, put his arm around your shoulder, and act all compassionate and stuff, when he has the inside scoop. But I’d better explain.
I don’t mean that Google Translate is all over in that no one is ever going to use it again. I mean it is all over, but I mean that in the sense that it’s everywhere and everybody’s using it. In fact, since we have been assimilated by Google as a human translation vendor, let me take this opportunity to inform you that resistance is futile―you will use Google Translate for all your machine translation needs.
What I mean is that Google Translate is finished in the sense that it is completed. Done. The dramatic improvements that we have seen over the last few years have brought this wonderful tool to fruition. And it’s not going to get any better unless Google radically rethinks its approach.
I figured this out from reading Tim Adams of the Guardian. Now, before you click away, a warning. Tim doesn’t know shit about machine translation. But that ol’ news hound sure how to do an interview because he got Google to spill the beans. This piece is like Gibbon on the Decline and Fall of the Statistical MT Empire.
As he reports, the recent rapid improvement in MT is the result of the use of statistical engines rather than the older rules-based systems.
In the early 1990s, IBM produced a model that abandoned any effort to have the computer ‘understand’ what was being fed into it, and instead loaded the engine with as much translation as they could shovel in and then did a statistical analysis. This was the preferred approach of Frederick Jelinek at IBM, who didn’t think much of rules-based systems pioneered in the 1970s. Jelinek once said, “Whenever I fire a linguist, the performance of our system improves.” (As someone who has fired as many linguists as I have, its usually the bad ones that get fired, ergo…)
Anyway, Google’s ability to bring MT along so far and so fast is less the result of any great breakthroughs in the statistical algorithms than the result of the ability of their spiders to lift millions of man-hours worth of human translation from the Web, dump it into the vast digital hoppers of their translation pigs (aka MT server), and spit it out on demand, for free, sort of.
“This technology can make the language barrier go away,” says Franz Och, who leads Google’s machine translation team. “It will allow anyone to communicate with anyone else.”
This is true, to a point. But what point? I’ve been selling MT and giving it away for the last 10 years, and while it’s much better than it used to be, the utility of it comes from ease of use. It’s all in the interface. If it isn’t fast, free and easy, better doesn’t matter.
But here’s the shocker: better isn’t even in the MT cards, statistically speaking.
Google Translate guy Andreas Zollmann abmits that more is not enough. “We are now at this limit where there isn’t that much more data in the world that we can use,” he admits. Because the MT databases are so large already that more doesn’t make the engines any better. To improve output quality by even .05% you’ve got to double the size of the database.
And there aren’t that many doublings left, if any. I can’t say how much text Google has assimilated into their machine translation databases, but it’s been reported that they have scanned about 11% of all printed content ever published. So double that, and double it again, and once more, shoveling all that into the translation hopper, and pretty soon you get the sum of all human knowledge, which means a whopping 1.5% improvement in the quality of the engines when everything has been analyzed. That’s what we’ve got to look forward to, at best, since Google spiders regularly surf the Web, which in its vastness dwarfs all previously published content. So to all intents and purposes, the statistical machine translation tools of Google are done. Outstanding job, Googlers. Thanks.
I’ve got to also thank Z, the translator formerly known as Jost Zetzsche, who pointed this out to me in a recent discussion. Well, actually it wasn’t Z personally, but Jeromebot, during the hand-puppet part of Z‘s presentation. (This guy really knows how to get a point across to a CEO).
PS I highly recommend his Punch and Judy version of the Common Sense Advisory Translation Industry Competative Analysis. But if you can’t find that on YouTube, at least subscribe to his invaluable newsletter, Translator’s Toolkit.