The potential of ChatGPT in translation evaluation: A case study of the Chinese-Portuguese machine translation
DOI:
https://doi.org/10.5007/2175-7968.2024.e98613Palavras-chave:
ChatGPT, machine translation (MT), automatic scoring, human assessment, evaluation metricResumo
The integration of artificial intelligence (AI) in translation assessment represents a significant evolution in the field, transcending traditional human-only scoring approaches. This study specifically examines the role of ChatGPT, a multilingual, transformer-based large language model developed by OpenAI, in the automated evaluation of machine translations between Portuguese and Mandarin. Despite ChatGPT's burgeoning reputation for its advanced Natural Language Processing (NLP) capabilities, research on its application in translation evaluation, particularly for this language pair, remains unexplored. To fill this gap, our research employed three prevalent machine translation tools to translate a set of twenty sentences from Chinese into Portuguese. Translated target text versions provided by professional Chinese-Portuguese translators were also included to estimate if the machine-translated target texts have achieved a certain degree of human parity. We then assessed these translations using both GPT models (ChatGPT 3.5 and ChatGPT 4.0) and five human raters to offer a comprehensive scoring analysis. The study's findings reveal that, particularly ChatGPT 4.0, exhibits substantial promise in evaluating translations across varied text types. However, this potential is tempered by notable inconsistencies and limitations in its performance. Through both quantitative analysis and qualitative insights, this research highlights the synergy between ChatGPT's automated scoring and traditional human assessment. It uncovers some key benefits of this automated approach: (1) exploring viability of automated translation evaluation, particularly in Chinese-Portuguese language pair; (2) fostering critical supplement to human evaluation, and (3) uncovering the astonishing capability of cutting-edge machine translation tools in Chinese-Portuguese language pair. Our findings contribute to a more detailed comprehension of ChatGPT's role in translation assessment and underscore the need for a balanced approach that leverages both human expertise and AI capabilities.
Referências
Beauchemin, D., Saggion, H., & Khoury, R. (2023). MeaningBERT: Assessing Meaning Preservation between Sentences. Frontiers in Artificial Intelligence, 6. https://doi.org/10.3389/frai.2023.1223924
Cao, Y., Kementchedjhieva, Y., Cui, R., Karamolegkou, A., Zhou, L., Dare, M., Donatelli, L., & Hershcovich, D. (2023). Cultural Adaptation of Recipes. arXiv preprint. https://doi.org/10.48550/arXiv.2310.17353
Capparelli, S., & Sun, Y. (2012). Poemas clássicos chineses. L&PM.
Colina, S. (2009). Further Evidence for a Functionalist Approach to Translation Quality Evaluation. Target, 21(2), 235–264. https://doi.org/10.1111/j.1540-4781.2011.01232_1.x
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge University Press.
Fernandes, P., Deutsch, D., Finkelstein, M., Riley, P., Martins, A. F., Neubig, G., Garg, A., Clark, J. H., Freitag, M., & Firat, O. (2023). The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation. arXiv preprint. https://doi.org/10.18653/v1/2023.wmt-1.100
Ghafar, Z. N. (2023). ChatGPT: A New Tool to Improve Teaching and Evaluation of Second and Foreign Languages A Review of ChatGPT: the Future of Education. International Journal of Applied Research and Sustainable Sciences, 1(2), 73–86. https://doi.org/10.59890/ijarss.v1i2.392
Guerreiro, N. M., Alves, D. M., Waldendorf, J., Haddow, B., Birch, A., Colombo, P., & Martins, A. F. (2023). Hallucinations in Large Multilingual Translation Models. Transactions of the Association for Computational Linguistics, 11, 1500–1517. https://doi.org/10.1162/tacl_a_00615
Guo, M., & Han, L. (2024). From Manual to Machine: Evaluating Automated Ear–voice Span Measurement in Simultaneous Interpreting. Interpreting, 26(1), 24–54. https://doi.org/10.1075/intp.00100.guo
Han, C. (2018). Quantitative Research Methods in Translation and Interpreting Studies. The Interpreter and Translator Trainer, 12(2), 244–247. https://doi.org/10.1080/1750399X.2018.1466262
Han, C. (2020). Translation Quality Assessment: A Critical Methodological Review. The Translator, 26(3), 257–273. https://doi.org/10.7202/037044ar
Han, C. (2021). Analytic Rubric Scoring versus Comparative Judgment: A Comparison of Two Approaches to Assessing Spoken-Language Interpreting. Meta, 66(2), 337–361. https://doi.org/10.7202/1083182ar
Han, C. (2022a). Assessing Spoken-Language Interpreting: The Method of Comparative Judgement. Interpreting, 24(1), 59–83. https://doi.org/10.1075/intp.00068
Han, C. (2022b). Interpreting Testing and Assessment: A State-of-the-art Review. Language Testing, 39(1), 30–55. https://doi.org/10.1177/02655322211036100
Han, C., Chen, S., & Fan, Q. (2019). Rater-mediated Assessment of Translation and Interpretation: Comparative Judgement versus Analytic Rubric Scoring. 5th International Conference on Language Testing and Assessment, Guangzhou, China.
Han, C., Hu, B., Fan, Q., Duan, J., & Li, X. (2022). Using Computerised Comparative Judgement to Assess Translation. Across Languages and Cultures, 23(1), 56–74. https://doi.org/10.1556/084.2022.00001
Han, L. (2022a). 中葡交替传译教程 Interpretação Consecutiva Chinês-Português. Universidade Politécnica de Macau.
Han, L. (2022b). 中葡口譯教學歷史、現狀與展望——兼及澳門的貢獻 [Portuguese Interpreting Teaching in China: Past, Present, and Future - Macao’s Contribution]. 澳門理工學報 [Revista da Universidade Politécnica de Macau], 25(2), 52–61.
Han, L. (2023). Tradução de poemas de Adriana Lisboa para o chinês: uma breve reflexão. Cadernos de Tradução, 43(esp. 3), 388–397. https://doi.org/10.5007/2175-7968.2023.e97182
Hasani, A. M., Singh, S., Zahergivar, A., Ryan, B., Nethala, D., Bravomontenegro, G., ... & Malayeri, A. (2024). Evaluating the performance of Generative Pre-trained Transformer-4 (GPT-4) in standardizing radiology reports. European Radiology, 34(6), 3566-3574. https://doi.org/10.1007/s00330-023-10384-x
Hendy, A., Abdelrehim, M., Sharaf, A., Raunak, V., Gabr, M., Matsushita, H., Kim, Y. J., Afify, M., & Awadalla, H. H. (2023). How Good are GPT Models at Machine Translation? A Comprehensive Evaluation. arXiv preprint. https://doi.org/10.48550/arXiv.2302.09210
House, J. (1997). Translation Quality Assessment: A Model Revisited. Gunter Narr Verlag.
House, J. (2001). Translation Quality Assessment: Linguistic Description versus Social Evaluation. Meta, 46(2), 243–257. https://doi.org/10.7202/003141ar
Jiao, W., Wang, W., Huang, J.-t., Wang, X., & Tu, Z. (2023). Is ChatGPT a Good Translator? A Preliminary Study. arXiv preprint. https://doi.org/10.48550/arXiv.2301.08745
Kadaoui, K., Magdy, S. M., Waheed, A., Khondaker, M. T. I., El-Shangiti, A. O., Nagoudi, E. M. B., & Abdul-Mageed, M. (2023). Tarjamat: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties. arXiv preprint. https://doi.org/10.48550/arXiv.2308.03051
Leiter, C., Opitz, J., Deutsch, D., Gao, Y., Dror, R., & Eger, S. (2023). The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics. arXiv preprint. https://doi.org/10.48550/arXiv.2310.19792
Lu, J., Han, L., & André, C. A. (2022). Tradução portuguesa de referências culturais extralinguísticas no Manual de Chinês Língua Não Materna: aplicação de estratégias de tradução propostas por Andrew Chesterman. Cadernos de Tradução, 42(1), 1–39. https://doi.org/10.5007/2175-7968.2022.e82416
Lu, X., & Han, C. (2023). Automatic Assessment of Spoken-language Interpreting based on Machine-translation Evaluation Metrics: A Multi-scenario Exploratory Study. Interpreting, 25(1), 109–143. https://doi.org/https://doi.org/10.1075/intp.00076.lu
Mu, L. (2006). 翻译测试及其评分问题 [Translation Testing and Grading]. Foreign Language Teaching and Research, 38(6), 466–471. https://doi.org/10.3969/j.issn.1000-0429.2006.06.010
Pöchhacker, F. (2022). Interpreters and Interpreting: Shifting the Balance? The Translator, 28(2), 148–161. https://doi.org/10.1080/13556509.2022.2133393
Pope, A. (1711). Sound and Sense. Sound and Sense.
Reiss, K. (2000). Translation Criticism: The Potentials and Limitations - Categories and Criteria for Translation Quality Assessment. St. Jerome Publishing.
Sun, Y., & Ye, Z. (2023). Tradução de metáforas verbo-pictóricas para páginas web do smartphone Huawei P40 Pro à luz da teoria de necessidades. Cadernos de Tradução, 43(esp. 3), 272–302. https://doi.org/10.5007/2175-7968.2023.e97183
TEM 8 Syllabus Revision Group. (1998). 高校英语专业八级考试大纲 [Syllabus for TEM 8]. Shanghai Foreign Language Education Express.
Trichopoulos, G., Konstantakis, M., Alexandridis, G., & Caridakis, G. (2023). Large language models as recommendation Systems in Museums. Electronics, 12(18), 3829. https://doi.org/10.3390/electronics12183829
Williams, M. (2001). The Application of Argumentation Theory to Translation Quality Assessment. Meta, 46(2), 326–344. https://doi.org/10.7202/004605ar
Williams, M. (2009). Translation Quality Assessment. Mutatis Mutandis: Revista Latinoamericana de Traducción, 2(1), 3–23.
Xiao, W. (2012). Research on the Test of Undergraduate Translation Majors. People’s Publishing House.
Yang, X., Yun, J., Zheng, B., Liu, L., & Ban, Q. (2023). Oversea Cross-lingual Summarization Service in Multilanguage Pre-trained Model through Knowledge Distillation. Electronics, 12(24), 5001. https://doi.org/10.3390/electronics12245001
Yang, Z. (2019). 翻译测试与评估研究 [Studies on Translation Testing and Assessment]. Foreign Languages Teaching and Research Press.
Zou, S. (2005). 语言测试 [Studies on Translation Testing and Assessment]. Shanghai Foreign Language Education Express.
Zou, S., & Xu, Q. (2017). A Washback Study of the Test for English Majors for Grade Eight (TEM8) in China—From the Perspective of University Program Administrators. Language Assessment Quarterly, 14(2), 140–159.
Downloads
Publicado
Como Citar
Edição
Seção
Licença
Copyright (c) 2024 Cadernos de Tradução
Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.
Autores mantêm os direitos autorais e concedem à revista o direito de primeira publicação, com o trabalho simultaneamente licenciado sob a Licença Creative Commons Atribuição 4.0 Internacional (CC BY) que permite o compartilhamento do trabalho com reconhecimento da autoria e publicação inicial nesta revista.
Autores têm autorização para assumir contratos adicionais separadamente, para distribuição não exclusiva da versão do trabalho publicada nesta revista (ex.: publicar em repositório institucional ou como capítulo de livro, com reconhecimento de autoria e publicação inicial nesta revista).