My previous post on Post-Editing Machine Translation (PEMT), generated more reactions that I would have thought. Some were positive, some less. More specifically, some perceived my text as a rejection of Machine Translation (MT) and its recent progress as a whole. I would like to clarify a number of points (no, I’m not a dinosaur) and reiterate my lack of interest for commercial PEMT projects – I will explain in further detail why they don’t make sense to me, regardless of how much MT improved over time.
No, I have nothing against MT
In fact, I use it in my translation tools! It’s a great way to generate lists of words for predictive typing, which indeed saves me a bit of time and reduces stress for my fingers. As as I said a few times as well, MT can be great for research purposes, especially for individuals. For these reasons, I am happy to see MT improve, both as a translator and as a human being.
Yes, statistical MT engines are improving
MT engines of today are smarter and keep learning from bilingual texts. It means that compared to legacy systems, they have a much better understanding of context and can string together pretty decent sentences if the right references are available. I have no doubt they will keep on improving, as long as there are people to train them.
No, I’m still not interested in PEMT projects (for publication purposes)
In my previous post, I made the mistake of referring to PEMT projects as if they were all identical. Post-editing machine translation for training purposes is something I would have no problem with. I would charge an hourly fee for this and all would be good.
The projects I mentioned (or meant to mention) are those for which we, translators, are asked to provide a discount to edit texts that went through some MT engine. Usually, this will be Google Translate or a very slightly customized version of it. The goal of people offering such projects is to get a translation ready for publication at a reduced price, which has nothing to do with MT engine training.
Here, we have two possible cases:
-The text was processed through a generic system, say Google/Bing Translate. These engines are using bilingual texts from sources that are so different in terminology, style, quality, etc. that the resulting text is generally very inconsistent and poorly written. Why one would want to give a discount for a text you need to rewrite from zero?
-The text was processed through an industry-specific, well-trained engine. Let’s go even further and suppose the target text comes together near-perfect. It still doesn’t cut it for me. When all I have is the text generated by the MT provider, I still have to look very carefully at the source and target segments and make sure everything is there and properly translated. Again, it can be very, very easy to let a mistake slip (say, a missing negation) because the MT engine used its “best match” and failed to translate the part it had no reference about.
If the generated translation is good, it means a very close translation already exists somewhere and the engine knows about it. In this case, at translation memory would work better, as it would allow me to see where exactly the new text is different and where I have to be careful. If the TM is approved and no proofreading is required for the existing part, the changes can be done very quickly, much more than if I had to check the whole segment.
To put it simple, a good MT output means there is a good TM available out there (maybe not in a TM format as such, but something that could be converted easily), and I’d rather work with the TM in question for practical purposes.
To sum it up:
– MT is useful
– PEMT for training purposes is fine
– PEMT to produce a professional translation is a waste of translator and client money