DARPA and Darmok Act 2: SpeechTrans Test

by Translation Guy on April 28, 2011

The Defense Advanced Research Projects Agency is spending $50 million a year in search of a universal translator just like the kind you can buy for $19.95 at the iTunes store.  DARPA’s Linguistic skunkworks have been searching for a universal translator ever since they saw those flip phones on Star Trek re-runs. But as we know from Captain Picard’s epic translation fail in  the “Darmok” episode, when an alien encounter ends with a stabbing, even though it’s only the alien that ends up dying in the dust, bleed-outs are never a good outcome for a machine translation tool-fueled encounter.

Spencer Ackerman whispers over research costs for such a tool in Psst, Military: There’s Already a Universal Translator in the App Store, recommending the SpeechTrans app that’s available for the iPhone. So I figured I would put the tool to test at the TranslationGuy Translation Tool Torture Test Track, kind of as a public service to do my bit, but more in the hopes that if I can save the gov’mint $50 million, I might get a bigger tax refund.

Now, since we don’t get any of that DARPA money, our budget was pretty constrained ($19.95 for the app), so to create the most realistic combat situation possible, testing was conducted via a conversation with my wife. We compared it with the Google Translate app, also available at the app store for free. If I understand correctly, SpeechTrans is basically a mash-up of the Nuance speech recognition engine and the Google Translate app, so the contest was basically between interface and voice recognition. So I walked in on the missus cold, whipped out my G4, opened the app, selected my language pair and started talking. “I’m going to have to leave lunch early because I have a conference call at 1:30.” And presto, my iPhone began speaking in Japanese. Barely audible, true, and absolute nonsense, but recognizably Japanese. You’ve got to watch the text display of the speech recognition pretty closely, because if it gets even one word wrong, the machine translation will be wacky as hell, as we learned through repeated efforts to get it right. Most irritating feature is that the Nuance voice recognition engine cannot tell time. “One-thirty” is recognized as “130” instead of “1:30,” and with that missing colon you are going to miss lunch. Curiously, “half past one” worked just fine, but no one’s spoken that way since watches went digital. So despite five minutes of back and forth, we were never quite able to firm up our lunch plans. Lunch was not translating very well, either. It took multiple recordings and recast sentences to get to a phrase that could be recognized by the translation engine, and then it took a lot of puzzling on the part of the listener to get a gist of the translation. For some inexplicable reason the volume of the translated text audio is inaudible. And there is no language default. Your language pair must be selected afresh for every translation.

None of these problems exist with the Google Translate app. I’ve never been a fan of the Nuance speech recognition engine, I guess from all those hours spent in frustration with old versions of Dragon Naturally Speaking, but I’ve been impressed by the high accuracy of Google’s voice recognition system  from the first time I heard it. But like any real-time translation systems, success or failure is defined by the user interface. Google Translate defaults to previous language pairs, displays translation history, and toggles easily between languages on a single input screen. Record and translate with a single click. Very nice, even if it doesn’t translate very well. Took two or three passes to get to the message, but it was doable.

SpeechTrans also does machine translation for Twitter and Facebook, which strikes me as such a useless feature that I can’t be bothered to test it.

Look. I’m an enthusiast. We’ve been trying to make money off of machine translation for years, and the web-based stuff is incredibly useful, but I don’t get it with the hand-held apps. They are certainly the coolest props on the Star Trek set, but I just can’t see yet how they could be better than gesturing for most encounters. Maybe there’s a learning curve or some way of using these tools that I don’t get just because I’m so close to the business.

So next step is to take some of these tools off the Translation Tool Torture Test Track and out into the mean streets of NYC and go talk with the tourists.  If any readers would like to join me some Saturday afternoon, we can take out a video camera and see what we get. Comments from users are of course welcome, particularly those in uniform.

In the meantime, my official recommendation to DARPA: Keep spending.

Post slots and my attention span permitting, I’ll be taking a look at other MT tools in the days ahead. I’m thinking of devoting a page to just that subject as a way to help me and my readers stay on top of a quickly changing technology landscape.


  1. Yk299 says:

    Dear Translate Guy,

    Thank you for the great review. A few differentiating factors that were not mentioned in the article between SpeechTrans and Google translate are the following n

    1. Google translate only let’s you record for 15 seconds. SpeechTrans 55 Seconds
    2. Google translate only works online. SpeechTrans can play back any previous recordings offline and all previous translations are saved automatically.
    3. SpeehTrans allows you to dictate punctuations and even next paragraphs by using voice commands such as question mark, period, colon and allows those translations to be emailed. Google does not.
    4. SpeechTrans allows global communication with anyone Across the world or the dinner table with real time voice translation telecommunication chat.
    5. Nuance Speech recognition accuracy 90% Google 60-70%
    6. Lastly SpeechTrans Lite is available for free in app store.

    • Ken says:

      Thanks for the critique of my incomplete analysis. Assuming you are a representative of SpeechTrans, here are my counter-questions.

      1. What are the advantages of a longer recording time? Quality of transcription/translation of 150 words would be even more problematic than the 35 words possible in a 15-second passage. I’d be interested in learning more about SpeechTrans’ decision to permit longer utterances from the user’s perspective.
      4. Aside from the fact that SpeechTrans is inaudible on speaker due to volume setting issues (easily corrected), how does the ability to use this tool in a variety of settings differ from the Google app?
      5. Citation on superiority of Nuance Speech engine over Google, please?
      Please feel free to contact me directly. I’m happy to help you set the record on SpeechTrans straight.

      Thanks, Ken

  2. Machine translation for Twitter and Facebook? You’re right Ken, what a joke. How can you translate “RT” and “@nickname”?

  3. Sugardip says:

    Cloud-based speech recognition engine… cool! Go Nuance.

  4. frenchophile says:

    No wonder they opened up the code for the Nuance Mobile Developer Program to developers – their stuff pretty much sucks.

  5. RealHigh says:

    Thanks for the public service announcement Ken.

  6. Chocoboy says:

    $50 million per year is peanuts to this department, but it would sure as #@$! Go a long way for some WORTHY programs. Thanks for making my day and getting my heart pumping about spending 😉

  7. Kamikaze says:

    Funny how the SpeechTrans app can’t recognize time, on the other hand how could it decifer between number and time?

LiveZilla Live Chat Software