Google Translate Trashes the Web
June 20, 2011

The Internet is definitely worse. And it’s all because of the reckless and unsafe abuse of Google Translate.

Have you noticed? The scum floating to the surface of your searches? Despite the best efforts of the Borg-like algorithms of Google Skynet collective consciousness, there is a whole set of content that floats to the surface in your searches. You’ve read it by mistake… acres of stupid, mindless content stretching on and on half-coherently to the digital horizon. A waste of time, but delicious Google spider-bait. It’s written for the spiders. In fact, almost everything you read on the web is Google spider-bait, since you wouldn’t be able to find it otherwise.

Sometimes that’s good. For example, when I do it. Like the birds and the bees. What my marketing guys make me do is write this blog as kind of a honey trap. I get you readers to click here (God bless each and every one of you) and read this stuff, maybe you tell your friends, send a link or a tweet, and with every action you also exude a little drop of honey-like link juice, which draws the Google spiders to weave their magic web around my site and boost us in the page rankings, higher and higher, towards the top three position in the search engines. This tiny miracle is repeated billions of times a day in the busy beehive that is the Web.

But to write decent content for people I figure takes about 300 words per hour, so a dedicated writer can produce about 10 pages a day, tops. So it’s much easier to copy the content from elsewhere on the Web. So now a lot of the Web is just a copy of other parts of the Web. But Google spiders are wise. They pass over that duplicate content now and will punish you if they find you are doing too much of that. So to outsmart the spiders, savvy programmers have been violating Google Translate’s API rules and doing translation party tricks to hide that kind of cut-and-paste plagiarism. By translating and back-translating the content between language pairs, black hats can automate the creation of unique content, a link-juice bonanza for the spiders, and the bane of human searchers who are able to recognize its uselessness.

And that’s just one particularly nefarious way in which Google Translate trashes the Web. Consider that the same productivity rules apply to translators and translation automation. All you busy translators here have been busily typing away today at about the same rate of 300 unique words per hour, the most microscopic trickle when compared to the Niagara Falls-like capability of Google’s vast server farms.

Now even in my Lord of the Flies-like Boy Scout Troop, we knew enough to drink upstream from where you peed.  But Google can’t yet tell which way is downhill when it comes to translation. Good, bad, or incomprehensible—it’s all the same to the spiders. Thus the Web is trashed, and Google Translate did it.

I’ve already posted on this, and I plan to post more. The implications from this are far-reaching, and Google’s radical decision to “deprecate” the Google Translate app is only the start of a fundamental debate about Google’s role in organizing the Web. Just goes to show you that “Do no evil” is a lot harder than it sounds.

Thanks to commentors Prevedi and Mark and Kirti Vashtee and Dion Wiggins of Asia Online for your thoughts on these issues. For more on this, Tim Carmody of Fast Company (if you can stand all the Joycean name-dropping) and the Atlantic’s James Fallows, the guy who always gets it.

