Translation and Interpreting in 200+ Languages

Most European Languages Doomed to Digital Extinction

October 3, 2012 -By: -In: Language - 15 comments

It used to be that all you needed to have a language was an army. Monopolies of violence and linguistic autonomy have always gone together like chocolate and peanut butter. Dialect is the consolation prize for losers.

But in the digital age, it takes more than artillery to keep a language alive, according to a recent study. The Multilingual Europe Technology Alliance (META) claims that 21 European languages face digital extinction in the near future for lack of enough data on the web. META researchers concluded that at least 21 European languages don’t have the digital support they need to survive online.

The study examines four areas META believes are essential for online language success: automatic translation, speech interaction, text analysis and the availability of other language resources. Without these tools, tasks such as searching and translating web pages, talking to digital devices and checking word processing documents must be performed in other languages online or not at all.

“The results of our study are most alarming. The majority of European languages are severely under-resourced and some are almost completely neglected. In this sense, many of our languages are not yet future-proof,” says Prof. Hans Uszkoreit, coordinator of META-NET.

“There are dramatic differences in language technology support between the various European languages and technology areas. The gap between ‘big’ and ‘small’ languages still keeps widening. We have to make sure that we equip all smaller and under-resourced languages with the needed base technologies, otherwise these languages are doomed to digital extinction,” says co-author Dr. Georg Rehm.

Icelandic, Irish, Latvian and Maltese are at the top of the extinction list, but even more commonly spoken languages, such as Greek, Portuguese and Swedish lack the digital support necessary for survival on the Web, claim the authors.

Because of the reliance on statistical analysis for automating translation and voice recognition, words by the hopper-full are needed to stoke the fires of digital language engines. Those languages with fewer speakers just can’t shovel the same amount of data into the linguistic furnace, so the little language engines that formerly could, now cannot, and are falling further and further behind in the data race.

That these scrappy little languages have endured for centuries without the benefit of a digital assist is unaddressed by the authors, at least as far as I read (which wasn’t very far). Since META’s mission (and funding) is “dedicated to building the technological foundations of a multilingual European information society,” it seems unlikely that their report would conclude with a “job well done” since that would imply that their job was, well, done. Note that META awarded English, that ubiquitous digital language monster, a “good” rather than an “excellent.”

Self-serving for sure, there’s no denying that the linguistic digital divide is real and growing. Is it a handicap that will drive users to seek opportunities in other languages, relegating these less commonly spoken languages to the circle of family and neighbors? History shows us that languages that fall out of commercial use usually decline in prestige, consigned to the mumbling of the marginalized, the sad end-game of once-great languages.

Below is the state of language technology in four categories for 30 European languages, with links. Draw your own conclusions, and comment accordingly.

Machine Translation

Excellent
support

Good
support

Moderate
support

Fragmentary
support

Weak/no
support

English

French

Spanish

 Catalan

Dutch

German

Hungarian

Italian

Polish

Romanian

 Basque

Bulgarian

Croatian

Czech

Danish

Estonian

Finnish

Galician

Greek

Icelandic

Irish

Latvian

Lithuanian

Maltese

Norwegian
(Bokmål,Nynorsk)

Portuguese

Serbian

Slovak

Slovene

Swedish

Speech Processing

Excellent
support

Good
support

Moderate
support

Fragmentary
support

Weak/no
support

English

 

 

Czech

Dutch

Finnish

French

German

Italian

Portuguese

Spanish

 Basque

Bulgarian

Catalan

Danish

Estonian

Galician

Greek

Hungarian

Irish

Norwegian
(Bokmål,Nynorsk)
Polish

Serbian

Slovak

Slovene

Swedish

Croatian

Icelandic

Latvian

Lithuanian

Maltese

Romanian

Text Analysis

Excellent
support

Good
support

Moderate
support

Fragmentary
support

Weak/no
support

English

Dutch

French

German

Italian

Spanish

Basque

Bulgarian

Catalan

Czech

Danish

Finnish

Galician

Greek

Hungarian

Norwegian
(Bokmål,Nynorsk)

Polish

Portuguese

Romanian

Slovak

Slovene

Swedish

Croatian

Estonian

Icelandic

Irish

Latvian

Lithuanian

Maltese

Serbian

Speech and Text Resources

Excellent
support

Good
support

Moderate
support

Fragmentary
support

Weak/no
support

English

 Czech

Dutch

French

German

Hungarian

Italian

Polish

Spanish

Swedish

Basque

Bulgarian

Catalan

Croatian

Danish

Estonian

Finnish

Galician

Greek

Norwegian
(Bokmål,Nynorsk)

Portuguese

Romanian

Serbian

Slovak

Slovene

 Icelandic

Irish

Latvian

Lithuanian

Maltese