Download PDFOpen PDF in browserAutomatic Classification of Human Translation and Machine Translation: a Study from the Perspective of Lexical DiversityEasyChair Preprint 54879 pages•Date: May 8, 2021AbstractBy using a trigram model and fine-tuning a pretrained BERT model for sequence clas- sification, we show that machine transla- tion and human translation can be classi- fied with an accuracy above chance level, which suggests that machine translation and human translation are different in a systematic way. The classification accu- racy of machine translation is much higher than of human translation. We show that this may be explained by the differ- ence in lexical diversity between machine translation and human translation. If ma- chine translation has independent patterns from human translation, automatic met- rics which measure the deviation of ma- chine translation from human translation may conflate difference with quality. Our experiment with two different types of au- tomatic metrics shows correlation with the result of the classification task. Therefore, we suggest the difference in lexical diver- sity between machine translation and hu- man translation be given more attention in machine translation evaluation. Keyphrases: BLEU, lexical diversity, machine translation, machine translation evaluation, translation varieties
|