Microsoft Demos A Star Trek-Style Universal Translator
Microsoft has showed off research that takes us significantly closer to a Star Trek-style universal translator: natural language translation, in real time, in the users own voice.
The demonstration by Microsoft chief research officer Rick Rashid (see embedded video below) at Microsoft Research Asias 21st Century Computing event was part of a speech to about 2,000 students in China on October. 25, and doesnt actually represent a product in the works. This work is in the pure research stages to push the boundaries of whats currently possible, a Microsoft spokeswoman said in an email.
The potential value of such capability is enormous, and obvious. On Star Trek, the universal translator made alien relations possible. For business travelers and tourists, speaking even a few words of the native tongue, let alone fluently, can make a big difference. For immigrants, learning the language of their new country is often the biggest barrier to assimilation. Thats why Microsoft and competitors like Google, among others have worked for years to develop real-time translation systems.
Rashids demonstration shows a real-time speech-to-text translation engine, with a similarly real-time assessment of its accuracy. (Microsoft didnt say how it generated the accuracy measurement.) According to Rashid, however, the accuracy has been improved by more than 30% compared to previous generations, with a current error rate as little as one in seven or eight words, or 13% to 14%. (Disclosing the error rate is significant, as competitors like Nuance usually compare recognition rates against their own products.)
Translation Needs Big Data
Microsoft is no stranger to automated translation; on Halloween, the company announced that it would be working with researchers in Central America to buid a version of the Microsoft Translator Hub to preserve the Mayan language. The Hub lets users create a model, add language data, then use Microsofts Windows Azure cloud service to power the automated translation. The idea, as Microsoft took pains to explain, was to preserve the dying language through the next baktun, the calendar cycle that ends this December, prompting waves of end-of-the-world predictions, including movies like 2012.
As Microsofts Translator Hub suggests, translation is predicated upon big data. The calculations are immensely complicated, not just dealing with the phonemes that make up each word, but also working out how thoughts are organized into proper grammar, as well as other elements like the genders of certain nouns, honorifics, and other cultural nuances. Microsoft built in speech-to-text tools inside of Windows XP, as Rashid points out, but the technology suffered arbitrary speech errors of about one in every four words. Although speech-to-text (and text-to-speech) has remained inside of Microsofts software as an accessibility tool, it hasnt yet served as a general replacement for the keyboard even, as Scott Forstalls departure from Apple demonstrates with some of the top minds in the industry powering technologies like Apples Siri.
Typically, machine translation is improved through training, as the software learns how a user pronounces various phonemes and generally becomes familiar with how the user says individual words.
His Masters Voice
Rashids demonstration went a step farther, however. The software not only learned what Rashid was saying, but also parsed the meaning, reorganizing it into Chinese. It also took his voice and recast the Chinese phonology in Rashids natural voice. How? By using a few hours speech of a native Chinese speaker and properties of Rashids own voice taken from about one hour of pre-recorded English data recordings of previous speeches he had made.
Real-time voice translation isnt exactly new. In the mobile space. Both Microsoft and Google, for example, have released apps that can translate text that a smartphone camera sees. And both offer conversational modes that are actually more akin to a CB radio: one person talks, taps stop, the phone translates and plays back a recorded voice, the other person speaks, and so on. What Rashids demonstration showed off was a much more conversational, continuous, natural means of translation.
And as Rashids blog post and the video highlight, the crowd applauded nearly every line. Thats the type of response every business traveler and tourist wouldnt mind when trying to make herself understood.
Mark Hachman, readwrite.com