A témához tartozó oldalak:   < [1 2 3 4]
How big of a threat is Google translate?
Téma indítója: Tim Drayton
Jeff Allen
Jeff Allen  Identity Verified
Franciaország
Local time: 09:11
több nyelv
+ ...
SMT likes it when there isn't word/phrase chunking Feb 19, 2009

Tim Drayton wrote:
I have a hypothesis, which I articulated at the beginning of this thread, that purely statistical machine translation will be far more successful in languages with similar syntactic structures, such that words tend to bunch together in similar groups in both the source and target languages and any statistical matches will be highly significant. I will watch developments with interest, but I remain sceptical as to whether statistical machine translation will ever be able to provide satisfactory results between languages with very different structures such as English and Turkish. A detailed investigation of this topic goes well beyond the scope of a thread like this.


Tim, actually this is the main difference between Translation Memory (referred to as Example-based MT) and Statistical MT.
The phenomenon of chunking (lexical or phrase level) is what TMs thrive on, and if you set the threshold low, then you could use it to pull up the translated segments based on terminology chunking. but the side effect of setting the threshold too low in such circumstances is that can also end up with a lot of noise of translated parts of segments which don't correspond.

However, Statistical MT takes a different approach, and I've explained it with some examples at:
statistical MT approach + TMs
http://www.proz.com/forum/translator_resources/100328-machine_translation:_your_experience_with_t he_various_mt_programmes_state_of_play-page2.html#998639

SMT will of course be good on such chunking, but the benefit of it is more interesting and useful when due to the distance between the words (or rather better to call them character or symbol stings, since they are usually based on sequences of bigram and trigrams - 2 and 3 characters). SMT lets you capitalize on content that does not have the high level of matched chunks. And that's why a lot of big multinational corporations are interested in SMT. Not all of their content is perfectly aligned and with high chunking. It's very diverse.

It's much easier to explain this visually with a list of sentences and showing how the SMT system actually processes everything in parallel, not in a sentence-by-sentence sequential way.

Jeff


 
A témához tartozó oldalak:   < [1 2 3 4]


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How big of a threat is Google translate?







TM-Town
Manage your TMs and Terms ... and boost your translation business

Are you ready for something fresh in the industry? TM-Town is a unique new site for you -- the freelance translator -- to store, manage and share translation memories (TMs) and glossaries...and potentially meet new clients on the basis of your prior work.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »