Sunday, January 10, 2010

Google's Sensitive Translation Service

Eric Baković from Language Log noticed a subtle feature of Google Translate. Google's machine translation system shows radically different results when you change the punctuation or the case of a text.

Here's an example of a small change in a text that improves Google's translation:

[Spanish] tu hija que te quiso tanto y no supo demostrarlo - perdoname.
[English] your daughter that you loved so much and she could not prove it - pardon me.

[Spanish] Tu hija que te quiso tanto y no supo demostrarlo - perdoname.
[English] Your daughter who loved you so much and failed to prove it - pardon me.

"I don't pretend to know anything about Google's translation algorithm(s), but I do find it interesting that what seem like very minor manipulations like those shown above can lead to both bizarrely different results as well as to subtle improvements," notices Eric.

Jim Regan offers a possible explanation: "Google uses statistical machine translation, so algorithms have little to do with it - the translation is created by matching all the translations available for the different parts of the sentence, and then ranked against an n-gram language model of the target language to see how likely it is that those particular phrases go together, to assemble the translation. As case can be significant - acronyms are usually all upper case, proper names use an initial capital, etc. - it makes sense that it affects the translation."

No comments:

Post a Comment