Large language models in translation quality assessment: The feasibility of human-AI collaboration

Chengxu Wang

doi:10.5007/2175-7968.2025.e108395

Autores/as

Chengxu Wang Nankai University https://orcid.org/0000-0002-8456-9426

DOI:

https://doi.org/10.5007/2175-7968.2025.e108395

Palabras clave:

translation quality assessment, LLMs, human-AI collaboration, translation of Chinese academic works, Prompt engineering

Resumen

.This research explores the potential application of Large Language Models (LLMs) in translation quality assessment within the Chinese Academic Translation Project (CATP), from a human-AI collaboration perspective. The study integrates the LISA QA Model and the Chinese standard GB/T 19682-2005 to develop a multidimensional translation quality assessment system, including typologies and weights of errors specific to Chinese academic works. Using this system, three LLMs (GPT-4, Claude-3.7, and Deepseek-R1) were employed to evaluate the Portuguese version of the work Introduction to Qing Dynasty Academic Thought, analyzing their performance and comparing it with the results of an assessment conducted by human experts, with the aim of exploring the feasibility of a collaborative model between humans and AI. Based on the experimental results, the research proposes a hierarchical assessment process of “AI screening-refined human judgment” and an inter-linguistic assessment mechanism of “Chinese prompt-multilingual verification”, constructing a translation quality assessment framework based on human-AI collaboration for the CATP. This study infuses elements of technological innovation into traditional translation quality assessment, providing a new technical support pathway for the strategy of “internationalization” of Chinese academic knowledge.

Citas

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://doi.org/10.48550/ARXIV.2005.14165

China National Knowledge Infrastructure (CNKI) (2025). Global Academic Insights from China and Beyond. China National Knowledge Infrastructure. http://www.cnki.net/index/

Chinese GB Standards. (2005, Mar 24). Target text quality requirements for translation services (GB/T 19682-2005). https://iplogger.com/2hhR16

DeepSeek-AI. (2025). DeepSeek-R1: Incentivizing reasoning capability in LLMs via reinforcement learning. Cornell University. https://arxiv.org/abs/2501.12948

DukeManh. (2024, Mar 5). Prompting Introduction. Github.com. https://github.com/dair-ai/Prompt-Engineering-Guide/blob/main/guides/prompts-intro.md

Gao, Q., & Qiu, H. M. (2022). 中国文化外译与国家翻译实践 [Chinese Culture Translation and National Translation Practice]. 中国翻译 [Chinese Translators Journal], 43(4), 129–132.

Giray, L. (2023). Prompt Engineering with ChatGPT: A Guide for Academic Writers. Annals of Biomedical Engineering, 51, 2629–2633. https://doi.org/10.1007/s10439-023-03272-4

He, H. Z., & Hu, W. H. (2018). 目的论视域下中华学术外译策略研究 [A Study on the Strategies of Translating Chinese Scholarship from the Perspective of Skopos Theory]. 新西部 [New West], 36(18), 87–102.

Hendy, A., Abdelrehim, M., Sharaf, A., Raunak, V., Gabr, M., Matsushita, H., Kim, Y. J., Afify, M., & Awadalla, H. H. (2023). How good are GPT models at machine translation? A comprehensive evaluation. Cornell University. https://doi.org/10.48550/arXiv.2302.09210

House, J. (1977). A Model for Translation Quality Assessment. G. Narr.

House, J. (1997). Translation Quality Assessment: A Model Revisited. G. Narr.

House, J. (2015). Translation as Communication across Languages and Cultures (1st ed.). Routledge.

Hu, K. B., & Li, X. Q. (2023). 大语言模型背景下翻译研究的发展: 问题与前景 [The Development of Translation Studies in the Context of Large Language Models: Issues and Prospects]. 中国翻译 [Chinese Translators Journal], 44(6), 64–73.

Jiang, L., Jiang, Y., & Han, L. (2024). The potential of ChatGPT in translation evaluation: A case study of the Chinese-Portuguese machine translation. Cadernos de Tradução, 44(1), 1–22. https://doi.org/10.5007/2175-7968.2024.e98613

Kocmi, T., & Federmann, C. (2023). Large language models are state-of-the-art evaluators of translation quality. Cornell University. https://doi.org/10.48550/arXiv.2302.14520

Li, Y. B. (2023). 论中华学术外译项目术语库构建的缘由——以《汉代学术史论》(英文版)为例 [The Rationale for Building a Terminology Database for the Chinese Academic Translation Project: A Case Study of The Academic History of the Han Dynasty (English Edition)]. 华中学术 [Central China Humanities], 15(1), 228–236.

Lo, L. S. (2023). The CLEAR path: A framework for enhancing information literacy through prompt engineering. Journal of Academic Librarianship, 49(4), 102720. https://doi.org/10.1016/j.acalib.2023.102720

Long, Y. Q., & Zhou, X. L. (2024). Exploration of Technology-enabled Terminology Translation and Management: A Case Study on Chinese Academic Translation Project of “C-E Translation of The Transformation of Rural China”. China Terminology, 26(2), 49–58.

Martínez Mateo, R. (2014). A Deeper Look into Metrics for Translation Quality Assessment (TQA): A Case Study. Miscelánea: A Journal of English and American Studies, 49, 73–93. https://doi.org/10.26754/ojs_misc/mj.20148792

O’Brien, S. (2024). Human-Centered augmented translation: against antagonistic dualisms. Perspectives, 32(3), 391–406. https://doi.org/10.1080/0907676X.2023.2247423

OpenAI. (2023). GPT-4 technical report. Cornell University. https://arxiv.org/abs/2303.08774

Reiss, K. (2000). Translation Criticism: The Potential and Limitations. Routledge.

Ren, W., & Li, J. J. (2021). 国家翻译能力研究：概念、要素、意义 [A Study on National Translation Competence: Concepts, Components, and Significance]. 中国翻译 [Chinese Translators Journal], 42(4), 5–14.

Shneiderman, B. (2022). Human-centered AI. Oxford University Press.

Si Xianzhu, Z. (2007). 功能语言学与翻译研究: 翻译质量评估模式建构 [Translation Studies from the Perspective of Systemic-functional Linguistics]. Peking University Press.

Sun, P. H. (2023). 做好中华学术外译, 助力对外法治传播 [Enhancing the Translation of Chinese Scholarship to Promote the International Dissemination of the Rule of Law]. 语言与法律研究 [International Journal of Language, Culture & Law], 5(2), 97–122.

Tao, Y. (2020). 外译质量评估的描写译学范式——内涵与路径 [The Descriptive Paradigm of Translation Studies in the Evaluation of Translation Quality into Foreign Languages: Connotations and Approaches]. 跨语言文化研究 [Cross-Linguistic & Cross-Cultural Studies], (1), 171–185.

Wang, C. (2025a). Large Language Models in Translation Quality Assessment: The Feasibility of Human-AI Collaboration [Data set]. Harvard Dataverse, V1. https://doi.org/10.7910/DVN/YBAMLW

Wang, C. (2025b). Replication Data for: Large Language Models in Translation Quality Assessment: The Feasibility of Human-AI Collaboration [Data set]. Harvard Dataverse. https://doi.org/p6x8

Wang, H. S., & Xie, F. (2024). 大语言模型技术驱动下翻译教育实践模式创新研究 [A Study on the Innovation of Translation Education Practice Models Driven by Large Language Model Technology]. 中国翻译 [Chinese Translators Journal], 45(2), 70–78.

Wang, H. S., & Zhang, C. Z. (2025). Translation practice model in the GenAI era: Technological iteration, industrial transformation, and trend outlook. Foreign Language Education, 46(1), 53–58. https://doi.org/10.16362/j.cnki.cn61-1023/h.2025.01.011

Wang, J. H., & Shi, J. (2024). 学术译者素养:概念、内涵与提升路径 [Academic Translator Competence: Concept, Connotations, and Paths for Improvement]. 西安外国语大学学报 [Journal of Xi’an International Studies University], 32(4), 91–97.

Wang, S. (2024). The Ethical Risks and Regulations of Generative Artificial Intelligence Prompt Engineering. Studies in Science of Science. https://doi.org/10.16192/j.cnki.1003-2053.20241126.003

Wang, S. S. (2017). 翻译质量研究的新视角——《职业化翻译中的质量:评估与改进》述评 [A New Perspective for Translation Quality Research: Review of Quality in Professional Translation: Assessment and Improvement]. 外国语 [Journal of Foreign Languages], 40(1),108–112.

Wang, S. S., Liu, Z. Q., & Li, D. (2020). 学术外译对话合作模式构建 [Constructing a Dialogic and Collaborative Model for Academic Translation into Foreign Languages]. 上海翻译 [Shanghai Journal of Translators], (5), 36–41.

Wei, Y. W., Li, N., & Zhao, L. W. (2022). Quality Standards, Quality Assessment and Development Trend of Machine Translation Based on High Frequency Error Type Analysis. Computer Science and Application, 12(10), 2275–2281. https://doi.org/10.12677/CSA.2022.1210232

White, J., Fu, Q., Hays, S., Sandborn, M., Olea, C., Gilbert, H., Elnashar, A., Spencer-Smith, J., & Schmidt, D. C. (2023). A prompt pattern catalog to enhance prompt engineering with ChatGPT. Cornell University. https://doi.org/10.48550/arXiv.2302.11382

Williams, M. (2004). Translation Quality Assessment: An Argumentation-Centered Approach (Perspectives on Translation). University of Ottawa Press.

Xu, J., & Mu, L. (2009). 中国翻译学研究 30 年(1978-2007) [Thirty Years of Translation Studies Research in China (1978-2007)]. 外国语 [Journal of Foreign Languages], 32(1), 77–87.

Yao, B., & Friedman, U. D. (2019). 中文社科文献外译的挑战、对策与建议——以《20世纪中国古代文化经典在域外的传播与影响研究》英译为例 [Challenges, Strategies, and Recommendations for Translating Chinese Social Science Literature into Foreign Languages: A Case Study of the English Translation of Study on the Extra-Territorial Dissemination and Influence of Ancient Chinese Cultural Classics in the 20th Century]. 中国翻译 [Chinese Translators Journal], 40(2), 149–156.

Yao, Y. Z. (2024). 大语言模型在汉英技术文献翻译中的应用实证研究 [An Empirical Study on the Application of Large Language Models in Translating Chinese Technical Texts into English]. 翻译界 [Translation Horizons], (2), 1–17.

Zhang, J. P., & Zhu, Y. P. (2024). 中华学术外译项目《中国陶瓷史》陶瓷文物图片说明英译研究——以生态翻译学”三维”转换为视角 [A Study on the English Translation of Captions for Ceramic Relic Images in the CATP History of Chinese Ceramics: From the Perspective of the “Three-Dimensional” Transformation in Eco-Translatology]. 景德镇陶瓷 [Jingdezhen Ceramics], 52(4), 16–22.

Large language models in translation quality assessment: The feasibility of human-AI collaboration

Autores/as

DOI:

Palabras clave:

Resumen

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Declaración de Derecho de Autor

Artículos más leídos del mismo autor/a

Idioma

Enviar un artículo

Indexadores

ISSN: 2175-7968