Connecting to English on the Web

by Translation Guy on July 14, 2011

Google researchers Daniel Ford and Josh Batson took a look at the links among languages on the Web. Their fascinating findings invite careful study by those with longer attention spans than I, so a quick gloss will have to do.

“Most web pages link to other pages on the same web site, and the few off-site links they have are almost always to other pages in the same language. It’s as if each language has its own web which is loosely linked to the webs of other languages. However, there are a small but significant number of off-site links between languages.”

The two Googlers found that most of those links went to English-language sites—no surprise considering its lingua-franca status and the sheer amount of content posted online in English. The webs of other languages typically have between 60 and 80 percent of their out-language links to English pages; however, only about 40% of the content in their survey was English, yet it attracted 79% of all out-language links, so quality presumably trumps quantity in this little surfer equation.

Not so surprisingly, this linguistic give-and-take involves a lot more take than give. Out-of-language linkbacks from English are miniscule by comparison. Chinese and Japanese are the exception, as there are relatively few links from pages in these languages to pages in English. This is despite the fact that Japanese and Chinese sites are the most popular non-English sites for English sites to link to. This is because the bigger the language, the more sites it has, and the lower the likelihood that readers will need to go through the headaches of turning to another language for information.

“The number of sites in a language is a strong predictor of its ‘introversion’, or fraction of off-site links to pages in the same language. Taking this into account shows that Chinese and Japanese webs are not unusually introverted given their size.”

Students of language will note that language introversion is a pattern easily seen in languages in the real world too, and is a good general indicator of the relative economic and cultural power housed under each linguist roof. Introversion corresponds to size, mostly.

English is the grand outlier here, in a category all its own as the greatest of the lingua francas (I’ll post on what Ford and Batson learned about the lesser lingua francas in another post). Only 45 percent of off-site links from English pages are to other English pages, making English the most extroverted web language given its size, which strikes me as unbelievable and makes me think I missed something important in the explanation. But like I said, this is only a gloss. More to come.


  1. Very interesting – it confirms what I had observed informally, that I VERY rarely end up on non-English sites when browsing normally. There is rarely a reason for an English-language site to link to a foreign-language site. That of course says as much about the average language skills of native English speakers as anything else…

    • Ken says:

      So take an enormous data set, then look at a very rare behavior (links to other language sites) which because of the original size of the data sample is measurable and representative, and this is what you get. Interesting for sure, but I’m still not sure what it means. Wish I knew statistics. More specifically to your point is that English is radically more “extroverted” than the other widely spoken languages, perhaps because so many English speakers have learned this language as their second one, boosting the scores despite the mono-lingual habits of most English speakers.

  2. English the most extroverted web language given its size? Wow, I also find that amazing, are we sure here? What’s the source of this?

    • Ken says:

      As cited in the post, Phyllis. Keep in mind that this is only about the vanishingly small number of site-links that actually go to another language, and think of the vast amount of English content on the web produced by non-native English speakers who link back to their native languages

  3. Hmmm, this is very relevant to SEO, thanks.

  4. Yep, this is a nice, quick gloss of the topic. Thanks again Ken.

    • Ken says:

      400 words or bust, Heather!

  5. Erin Marsh says:


  6. This is a great post, I think you really hit the nail on the head with your statement about language size and the the more sites it has, the lower the likelihood that readers will need to turn to other languages for information.

    I’m also interested in how it’s not as simple as directly translating from one language to another because different languages have different ways of getting the same point across (Spanish with it’s por la noche vs in the or at night in English). It’s something I’ve been thinking about a lot lately and plan on writing about quite a bit in my upcoming ‘proper’ blog, especially on prepositions.

    • Ken says:

      Sounds interesting.

LiveZilla Live Chat Software