A translation robot for each translator? : a comparative study of manual translation and post-editing of machine translations: process, quality and translator attitude

Publication type
Publication status
Daems, J
Ghent University. Faculty of Arts and Philosophy (Ghent, Belgium)
View in Biblio
(externe link)


To keep up with the growing need for translation in today's globalised society, post-editing of machine translation is increasingly being used as an alternative to regular human translation. While presumably faster than human translation, it is still unsure whether the quality of a post-edited text is comparable to the quality of a human translation, especially for general text types. In addition, there is a lack of understanding of the post-editing process, the effort involved, and the attitude of translators towards it.

This dissertation contains a comparative analysis of post-editing and human translation by students and professional translators for general text types from English into Dutch. We study process, product, and translators' attitude in detail.

We first conducted two pretests with student translators to try possible experimental setups and to develop a translation quality assessment approach suitable for a fine-grained comparative analysis of machine-translated texts, post-edited texts, and human translations. For the main experiment, we examined students and professional translators, using a combination of keystroke logging tools, eye tracking, and surveys. We used both qualitative analyses and advanced statistical analyses (mixed effects models), allowing for a multifaceted analysis.

For the process analysis, we looked at translation speed, cognitive processing by means of eye fixations, the usage of external resources and its impact on overall time. For the product analysis, we looked at overall quality, frequent error types, and the impact of using external resources on quality. The attitude analysis contained questions about perceived usefulness, perceived speed, perceived quality of machine translation and post-editing, and the translation method that was perceived as least tiring. One survey was conducted before the experiment, the other after, so we could detect changes in attitude after participation. In two more detailed analyses, we studied the impact of machine translation quality on various types of post-editing effort indicators, and on the post-editing of multi-word units.

We found that post-editing is faster than human translation, and that both translation methods lead to products of comparable overall quality. The more detailed error analysis showed that post-editing leads to somewhat better results regarding adequacy, and human translation leads to better results regarding acceptability. The most common errors for both translation methods are meaning shifts, logical problems, and wrong collocations. Fixation data indicated that post-editing was cognitively less demanding than human translation, and that more attention was devoted to the target text than to the source text. We found that fewer resources are consulted during post-editing than during human translation, although the overall time spent in external resources was comparable. The most frequently used external resources were Google Search, concordancers, and dictionaries. Spending more time in external resources, however, did not lead to an increase in quality. Translators indicated that they found machine translation useful, but they preferred human translation and found it more rewarding. Perceptions about speed and quality were mixed. Most participants believed post-editing to be at least as fast and as good as human translation, but barely ever better. We further discovered that different types of post-editing effort indicators were impacted by different types of machine translation errors, with coherence issues, meaning shifts, and grammatical and structural issues having the greatest effect. HTER, though commonly used, does not correlate well with more process-oriented post-editing effort indicators. Regarding the post-editing of multi-word units, we suggest 'contrast with the target language' as a useful new way of classifying multi-word units, as contrastive multi-word units were much harder to post-edit. In addition, we noticed that research strategies for post-editing multi-word units lack efficiency. Consulting external resources did lead to an increased quality of post-edited multi-word units, but a lot of time was spent in external resources when this was not necessary.

Interestingly, the differences between human translation and post-editing usually outweighed the differences between students and professionals. Students did cognitively process texts differently, having longer fixation durations on the source text during human translation, and more fixations on the target text during post-editing, whereas professional translators' fixation behaviour remained constant. For the usage of external resources, only the time spent in dictionaries was higher for students than for professional translators, the usage of other resources was comparable. Overall quality was comparable for students and professionals, but professionals made fewer adequacy errors. Deletions were more noticeable for students than for professional translators in both methods of translation, and word sense issues were more noticeable for professional translators than for students when translating from scratch. Surprisingly, professional translators were often more positive about post-editing than students, believing they could produce products of comparable quality with both methods of translation. Students in particular struggled with the cognitive processing of meaning shifts, and they spent more time in pauses than professional translators.

Some of the key contributions of this dissertation to the field of translation studies are the fact that we compared students and professional translators, developed a fine-grained translation quality assessment approach, and used a combination of state-of-the-art logging tools and advanced statistical methods. The effects of experience in our study were limited, and we suggest looking at specialisation and translator confidence in future work. Our guidelines for translation quality assessment can be found in the appendix, and contain practical instructions for use with brat, an open-source annotation tool. The experiment described in this dissertation is also the first to integrate Inputlog and CASMACAT, making it possible to include information on external resources in the CASMACAT logging files, which can be added to the CRITT Translation Process Research Database.

Moving beyond the methodological contributions, our findings can be integrated in translation teaching, machine translation system development, and translation tool development. Translators need hands-on post-editing experience to get acquainted with common machine translation errors, and students in particular need to be taught successful strategies to spot and solve adequacy issues. Post-editors would greatly benefit from machine translation systems that made fewer coherence errors, meaning shift errors, and grammatical and structural errors. If visual clues are included in a translation tool (e.g., potentially problematic passages or polysemous words), these should be added to the target text. Tools could further benefit from integration with commonly used external resources, such as dictionaries.

In the future, we wish to study the translation and post-editing process in even more detail, taking pause behaviour and regressions into account, as well as look at the passages participants perceived as the most difficult to translate and post-edit. We further wish to gain an even better understanding of the usage of external resources, by looking at the types of queries and by linking queries back to source and target text words.
While our findings are limited to the post-editing and human translation of general text types from English into Dutch, we believe our methodology can be applied to different settings, with different language pairs. It is only by studying both processes in many different situations and by comparing findings that we will be able to develop tools and create courses that better suit translators' needs. This, in turn, will make for better, and happier, future generations of translators.