Sunday, September 9, 2007

Microsoft Launches Translation Service


Microsoft launched a service for automatic translation called Windows Live Translator. The site lets you translate a text limited to 500 words or a web page from English to German, Dutch, French, Spanish, Portuguese, Italian, Korean, Chinese, Japanese, Russian.

Microsoft uses Systran to produce most of the translations, but also offers an option to translate computer-related texts using a machine translation system developed in-house. Microsoft's translation technology has been used to translate technical materials, including MSDN Library.

"Recent research in Machine Translation (MT) has focused on data-driven systems. Such systems are self-customizing in the sense that they can learn the translations of terminology and even stylistic phrasing from already translated materials. Microsoft Research MT (MSR-MT) system is such a data-driven system, and it has been customized to translate Microsoft technical materials through the automatic processing of hundreds of thousands of sentences from Microsoft product documentation and support articles, together with their corresponding translations."

Microsoft intends to integrate this service into Live Search and provide a feature already available in other search engines for a long time. Windows Live Translator's presentation is extremely interesting: the default view shows the original page and the translation side by side in two vertical frames. If you hover over a sentence in one of the pages, the sentence is highlighted in both pages. If you scroll in one of the pages, the other page performs the same action. This is an interesting approach especially for those who speak both languages fairly well or want to learn a new language. Unfortunately, it's difficult to read a page that requires to scroll horizontally.


Google also has a translation service powered by Systran. The translations are identical to the ones returned by Babel Fish, but they're different from Windows Live's translations, so Microsoft might use an updated version of Systran's software.

Google developed a machine translation system that's available to the public for only three languages: Arabic, Chinese and Russian. To expand these systems to other languages, it's important to have a lot of parallel texts. "Rather than argue about whether this algorithm is better than that algorithm, all you have to do is get ten times more training data. And now all of a sudden, the worst algorithm is performing better than the best algorithm on less training data," explained Peter Norvig, Director of Research at Google.

While machine translation is not yet a replacement for human translation in most cases, it's a great way to get the approximate gist of a text in a foreign language. One of the most important problems is that machine translation doesn't always produce coherent phrases and doesn't understand the subtleties of language, so don't use it to translate poetry or to send important emails.

No comments:

Post a Comment