The “s” in World Wide Webs is no typo. You’re in the English World Wide Web right now, since I’m writing in English. And if you are reading this in some other language, well, then you must be in another Web. The connections among these different languages were the recent focus of Googler researchers Daniel Ford and Josh Batson.
In a recent post, we explored how English, the Web’s lingua franca, was connected to all the other languages on the Web. Interesting, but the drumbeat of English-language triumphalism tends to obscure the less obvious connections among other languages. The researchers found plenty of collections among languages other than English, using cyberspace to illuminate the bonds among peoples carved in history with silver and steel.
From a professional perspective, I’ve often wondered about the translation business outside of English. I can count on one hand the work we do each year in translations where English isn’t part of the language pair, and other than for major European languages, it’s extremely difficult to find qualified linguists to work outside of English. Even in a language as commercially important as Japanese, we are often in a position where we provide Japan-based Language Service Providers with all the non-English target translations from an English relay document originally translated from Japanese. The Google research is good evidence that this non-English translation remains so small a business.
Obviously, the language map looks a lot like the real world. Neighbors can be seen talking over backyard language barriers on the Scandinavian and Iberian peninsulas. Not only are these languages neighbors, they are pretty similar linguistically, so it’s pretty easy for speakers to jump back and forth. Europe hangs together, and connections among different languages are robust, with the Poles bridging east and west. The legacy of failed unions is clear to see also, with Czechs checking in on Slovaks (remember Czechoslovakia) and Croats, Serbs, Macedonians and Bulgarians doing much the same.
The ghostly outlines of the old Soviet Union are still evident in language links across Eastern Europe, and the cultural and linguistic reach of Arabic and Persian is mapped. Not so strangely absent are the links among Asian languages where immigration has been limited and linguistic divides are higher. Malay and Indonesian are the delightful exceptions to that Asian rule where the nation-building efforts of both countries have focused around a trade language that swept the region accompanied by kris and grapeshot hundreds of years ago.
The researchers found that there was a pretty close correlation between number of out-language links in a web language and number of sites. To identify special relationships by language, they generated another map that looked for links 50 times greater than predicted by size/link correlation. The researchers were surprised by “links from Hindi to Ukrainian, Kurdish to Swedish, Swahili to Tagalog and Bengali, and Esperanto to Polish,” which I don’t find surprising at all, since these language connections follow so closely the human connections forged in the modern age. Here’s that chart.