Translation and Interpreting in 150+ Languages
Connecting to English on the Web
July 14, 2011 - By: - In: Multilingual Web - Comments Off on Connecting to English on the Web

Google researchers Daniel Ford and Josh Batson took a look at the links among languages on the Web. Their fascinating findings invite careful study by those with longer attention spans than I, so a quick gloss will have to do.

“Most web pages link to other pages on the same web site, and the few off-site links they have are almost always to other pages in the same language. It’s as if each language has its own web which is loosely linked to the webs of other languages. However, there are a small but significant number of off-site links between languages.”

The two Googlers found that most of those links went to English-language sites—no surprise considering its lingua-franca status and the sheer amount of content posted online in English. The webs of other languages typically have between 60 and 80 percent of their out-language links to English pages; however, only about 40% of the content in their survey was English, yet it attracted 79% of all out-language links, so quality presumably trumps quantity in this little surfer equation.

Not so surprisingly, this linguistic give-and-take involves a lot more take than give. Out-of-language linkbacks from English are miniscule by comparison. Chinese and Japanese are the exception, as there are relatively few links from pages in these languages to pages in English. This is despite the fact that Japanese and Chinese sites are the most popular non-English sites for English sites to link to. This is because the bigger the language, the more sites it has, and the lower the likelihood that readers will need to go through the headaches of turning to another language for information.

“The number of sites in a language is a strong predictor of its ‘introversion’, or fraction of off-site links to pages in the same language. Taking this into account shows that Chinese and Japanese webs are not unusually introverted given their size.”

Students of language will note that language introversion is a pattern easily seen in languages in the real world too, and is a good general indicator of the relative economic and cultural power housed under each linguist roof. Introversion corresponds to size, mostly.

English is the grand outlier here, in a category all its own as the greatest of the lingua francas (I’ll post on what Ford and Batson learned about the lesser lingua francas in another post). Only 45 percent of off-site links from English pages are to other English pages, making English the most extroverted web language given its size, which strikes me as unbelievable and makes me think I missed something important in the explanation. But like I said, this is only a gloss. More to come.

LiveZilla Live Chat Software