The role of metrics to assess the quality of British teenage language translation into Spanish and Italian using machine translation tools
DOI:
https://doi.org/10.5007/2175-7968.2025.e101329Palavras-chave:
machine translation, teenage language, quality assessment, BLEU, METEORResumo
The rapid evolution of adolescence language, characterized by slang and idiomatic expressions, presents a significant challenge for machine translation systems. Existing research has extensively covered the translation of languages in general; however, there remains a gap in understanding these systems’ ability when faced with adolescent language. This study aims at (i) the evaluation and the comparison of the accuracy of the translations of colloquial language by Bing Translator, DeepL and HelsinkiNLP from English into Spanish and Italian, (ii) the validity and reliability of two different metrics (i.e., BLEU, METEOR) to assess the accuracy and quality of MT tools with informal language, and (iii) the analysis of how specific features of teenage slang influence the ability of online tools to generate precise and comprehensible translations 1000-character excerpts from the Linguistic Innovators Corpus were translated in Spanish and Italian using DeepL, Bing Translator, and HelsinkiNLP and assessed using BLEU and METEOR metrics to verify their quality and reliability. Our findings show that teenage slang poses challenges for all tools, particularly with phrasal verbs and idioms. Our results also reveal that METEOR seems to be more reliable to assess British teenage language into Spanish and Italian.
Referências
Agarwal, A., & Lavie, A. (2008). METEOR, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output. Proceedings of the Third ACL Workshop on Statistical Machine Translation. Association for Computational Linguistics.
Alawi, N., & Abdulhaq, S. (2017). Machine Translation: The Cultural and Idiomatic Challenge. Journal of Al-Azhar University – Gaza (Humanities), 19(2), 1–28.
Banitz, B. (2020). Machine translation: A critical look at the performance of rule-based and statistical machine translation. Cadernos de Tradução, 40(1), 54–71. https://doi.org/10.5007/2175-7968.2020v40n1p54
Baziotis, C., Mathur, P., & Hasler, E. (2023). Automatic Evaluation and Analysis of Idioms in Neural Machine Translation. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 3649–3661). Association for Computational Linguistics.
Birdsell, B. J. (2022). Student writings with DeepL: Teacher evaluations and implications for teaching. In P. Ferguson & R. Derrah (Eds.), JALT2021: Reflections and New Perspectives (pp. 117-125). JALT. https://doi.org/10.37546/JALTPCP2021-14
Chatzikoumi, E. (2019). How to evaluate Machine Translation: A review of Automated and Human Metrics. Natural Language Engineering, 26(2), 137–161. https://doi.org/10.1017/S1351324919000469
Cheshire, J. (2007). Discourse Variation, Grammaticalisation and “Stuff like That”. Journal of Sociolinguistics, 11(2), 155–193. https://doi.org/10.1111/j.1467-9841.2007.00317.x
Costa, Â., Ling, W., Luís, T., Correia, R., & Coheur, L. (2015). A linguistically motivated taxonomy for Machine Translation error analysis. Machine Translation, 29(2), 127–161. http://dx.doi.org/10.1007/s10590-015-9169-0
Das, A. K. (2018). Translation and Artificial Intelligence: Where are we heading? International Journal of Translation, 30(1), 1–26.
Dorr, B., Snover, M., & Madnani, N. (2010). Chapter 5.1 Introduction. In B. Dorr (Ed.), Part 5: Machine Translation Evaluation (pp. 802–806). DARPA GALE Program Report.
Duan, G., Yang, H., Qin, K., & Huang, T. (2021). Improving Neural Machine Translation Model with Deep Encoding Information. Cognitive Computation, 13, 972–980. https://doi.org/10.1007/s12559-021-09860-7
Eckert, P. (2003). Language and adolescent peer groups. Journal of Language and Social Psychology, 22(1), 112-118. https://doi.org/10.1177/0261927X02250063
Gaspari, F., & Zacchetta, E. (2011). Scrittura controllata per la traduzione automatica. In G. Bersani Berselli (Ed.), Usare la Traduzione Automatica (pp. 63-79). Clueb.
Goto, I., & Tanaka, H. (2017). Detecting Untranslated Content for Neural Machine Translation. Proceedings of the First Workshop on Neural Machine Translation. Association for Computational Linguistics.
Hadla, L. S., Hailat, T. M., & Al-Kabi, M. N. (2015). Comparative Study Between METEOR and BLEU Methods of MT: Arabic into English Translation as a Case Study. International Journal of Advanced Computer Science and Applications (IJACSA), 6(11), 215–223. https://dx.doi.org/10.14569/IJACSA.2015.061128
He, L., Ghassemiazghandi, M., & Subramaniam, I. (2024). Comparative assessment of Bing Translator and Youdao Machine Translation Systems in English-to-Chinese literary text translation. Forum for Linguistic Studies. 6(2), 1–18. https://doi.org/10.59400/fls.v6i2.1189
Hutchins, J., & Somers, H. (1992). An Introduction to Machine Translation. Academic Press Limited.
Jibreel, I. (2023). Online Machine Translation Efficiency in Translating Fixed Expressions Between English and Arabic (Proverbs as a Case-in-Point). Theory and Practice in Language Studies, 13(5), 1148–1158. https://doi.org/10.17507/tpls.1305.07
Jufriadi, J., Asokawati, A., & Thayyib, M. (2022). The Error Analysis of Google Translate and Bing Translator in Translating Indonesian Folklore. FOSTER: Journal of English Language Teaching, 3(2), 69–79. https://doi.org/10.24256/foster-jelt.v3i2.89
Lavie, A., & Denkowski, M. (2009). The METEOR metric for automatic evaluation of Machine Translation. Machine Translation, 23, 105–115. https://doi.org/10.1007/s10590-009-9059-4
Lee, S., Lee, J., Moon, H., Park, C., Seo, J., Eo, S., Koo, S., & Lim, H. (2023). A Survey on Evaluation Metrics for Machine Translation. Mathematics, 11(4), 1–22. https://doi.org/10.3390/math11041006
Lotz, S., & Van Rensburg, A. (2016). Omission and other sins: Tracking the quality of online machine translation output over four years. Stellenbosch Papers in Linguistics, 46, 77–97. https://doi.org/10.5774/46-0-223
Mathur, N., Baldwin, T., & Cohn, T. (2020). Tangled up in BLEU: Reevaluating the Evaluation of Automatic Mahine Translation Evaluation Metrics. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.
Mayor Martínez, A., Alegría Loinaz, I., Díaz de Ilarraza Sánchez, A., Labaka Intxauspe, G.,Lersundi Ayestaran, M., & Sarasola Gabiola, K. (2009). Evaluación de un sistema de traducción automática basado en reglas o por qué BLEU sólo sirve para lo que sirve. Procesamiento del Lenguaje Natural, 43, 197–205.
Moneus, A. M., & Sahari, Y. (2024). Artificial intelligence and human translation: A contrastive study based on legal texts. Heliyon, 10(6), 1–14. https://doi.org/10.1016/j.heliyon.2024.e28106
Napoletano, M. C., & Canga Alonso, A. (2023). The Translation of Adolescence Language by means of Apertium, Systran and Google Translate. Revista Electrónica de Lingüística Aplicada, 22(1), 148–163. http://dx.doi.org/10.58859/rael.v23i1.585
Nicholas, G., & Bhatia, A. (2023). Lost in translation: Large language models in non-English content analysis. Center for Democracy & Technology. https://doi.org/10.48550/arXiv.2306.07377
Nuraeni, F. W., & Pahamzah, J. (2021). An Analysis of Slang Language used in Teenager Interaction. Litera, 20, 313–322. http://dx.doi.org/10.21831/ltr.v20i2.37058
Orrego-Carmona, D. (2022). Machine translation in everyone’s hands – Adoption and changes among general users of MT. Revista Tradumàtica. Tecnologies de la Traducció, 20, 322–339. https://doi.org/10.5565/rev/tradumatica.324
Palacios Martínez, I. M. (2011). The language of British teenagers: A preliminary study of its main grammatical features. Atlantis, 33(1), 105–126.
Palacios Martínez, I. M. (2013). Non-standard negation in Modern English: A corpus-based study of four salient features. ES Review. Spanish Journal of English Studies, 34, 211–226.
Palacios Martínez, I. M. (2020). Taboo vocatives in the language of London teenagers. Pragmatics, 31(2), 250–277 https://doi.org/10.1075/prag.19028.pal
Palacios Martínez I. M. (2021). Recent changes in London English: An overview of the main lexical, grammar and discourse features of Multicultural London English (MLE). Complutense Journal of English Studies, 29, 1–20. https://doi.org/10.5209/cjes.77504
Peña Aguilar, A. (2023). Challenging machine translation engines: Some Spanish-English linguistic problems put to the test. Cadernos de Tradução, 43(1), 1–26. https://doi.org/10.5007/2175-7968.2023.e85397
Pimentel, C. H. M., & Pires, T. B. (2024). Treinamento e análise de um modelo de tradução automática baseado em Transformer. Texto Livre: inguagem e Tecnologia, 17, 1–15. https://doi.org/10.1590/1983-3652.2024.49118
Pym, A. (2020). Quality. In M. O’Hagan (Ed.), The Routledge Handbook of Translation and Technology (pp. 437–449). Routledge.
Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 3–13.
Roiss, S., & Zimmermann González, P. (2020). DeepL y su potencial para el desarrollo de la capacidad de análisis crítico en la clase de traducción inversa. Hermēneus. Revista de Traducción e Interpretación, 22, 363–382. https://doi.org/10.24197/her.22.2020.363-382
Rosyadi Za, D., Purnamawati, N., Galuh Dwi Ajeng, A. M., & Hejash, M. (2023). Slang as a Medium of Communication for Adolescents in Social Interaction between Others. JETA. Journal of English Teaching and Applied Linguistic, 4(1), 1–14. https://doi.org/10.52217/jeta.v4i1.1141
Russo, L., Loáiciga, S., & Gulati, A. (2012). Improving Machine Translationof null subjects in Italian and Spanish. In Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 81–89). Association for Computational Linguistics.
Saadany, H., & Orasan, C. (2021). BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-Oriented Text. Proceedings of the Translation and Interpreting Technology Online Conference (pp. 48–56). INCOMA Ltd.
Sharma, S., Diwakar, M., Singh, P., Singh, V., Kadry, S., & Kim, J. (2023). Machine translation systems based on classical-statistical-deep learning approaches. Electronics, 12(7), 1–29. https://doi.org/10.3390/electronics12071716
Sharou, K. A., & Specia, L. (2022). A taxonomy and study of critical errors in Machine Translation. In H. Moniz, L. Macken, A. Rufener, L. Barrault, M. R. Costa-Jussà, C. Declercq, M. Koponen, E. Kemp, S. Pilos, M. L. Forcada, C. Scarton, J. Van den Bogaert, J. Daems, A. Tezcan, B. Vanroy & M. Fonteyne (Eds.), Proceedings of the 23rd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.
Silalahi, E., & Silalahi, N. (2023). Linguistics Realization Analysis on Slang Word; Social Media Whatsapp. JETAL. Journal of English Teaching & Applied Linguistic, 5, 8–13. http://dx.doi.org/10.36655/jetal.v5i1.1120
Smirnov, A. V., Teslya, N., Shilov, N., Frank, D., Minina, E., & Kovacs, M. (2022). Comparative Analysis of Neural Translation Models based on Transformers Architecture. Proceedings of the 24th International Conference on Enterprise Information Systems (ICEIS 2022) (pp. 586–593). https://doi.org/10.5220/0011083600003179
Son, J., & Kim, B-Y. (2023). Translation Performance from the User’s Perspective of Large Language Models and Neural Machine Translation Systems. Information 14(10), 1–18. https://doi.org/10.3390/info14100574
Song, R. (2022). Analysis on the Recent Trends in Machine Translation. Highlights in Science, Engineering and Technology, 16, 40–47. https://doi.org/10.54097/hset.v16i.2228
Tagliamonte, S. A., & Denis, D. (2010). The Stuff of Change: General Extenders in Toronto, Canada. Journal of English Linguistics, 38(4), 335–368. https://doi.org/10.1177/0075424210367484
Tavosanis, M. (2019). Valutazione umana di Google Traduttore e DeepL per le traduzioni di testi giornalistici dall’inglese verso l’italiano. In R. Bernardi, R. Navigli & G. Semeraro (Eds.), CLiC-it 2019. Proceedings of the Sixth Italian Conference on Computational Linguistics. CEUR.
Thiruumeni, P. G., Anand, K., Dhanalakshmi, V., & Soman, K. P. (2011). An approach to handle idioms and phrasal verbs in English-Tamil Machine Translation system. International Journal of Computer Applications, 26, 36–41. https://doi.org/10.5120/3139-4328
Tognini-Bonelli, E. (2001). Corpus linguistics at work. John Benjamins.
Torgersen, E. N., Gabrielatos, C., Hoffmann, S., & Fox, S. (2011). A corpus-based study of pragmatic markers in London English. Corpus Linguistics and Linguistic Theory, 7(1), 93–118. https://doi.org/10.1515/cllt.2011.005
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., & Kaiser L. (2017). Attention is All You Need. Advances in Neural Information Processing Systems, 7, 1–15. https://doi.org/10.48550/arXiv.1706.03762
Vilar, D., Xu, J., D’Haro, L. F., & Ney, H. (2006). Error Analysis of Statistical Machine Translation Output. Proceedings of the Fifth International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).
Volk, M. (1998). The automatic translation of idioms. Machine translation vs. translation memory systems. In: N. Weber (Ed.), Machine translation: theory, applications, and evaluation. An assessment of the state of the art (pp. 167–192). Gardez-Verlag.
Wang, H., Wu, H., He, Z., Huang, L. B., & Church, K. W. (2021). Progress in Machine Translation. Engineering, 18, 143–153. https://doi.org/10.1016/j.eng.2021.03.023
Wang, Y. (2023). Research of types and current state of machine translation. Proceedings of the 2023 International Conference on Machine Learning and Automation. EWA Publishing. https://doi.org/10.54254/2755-2721/37/20230479
Zhao, Z. (2022). The Machine Translation Model. Proceedings of the 2022 5th International Conference on Humanities Education and Social Sciences (ICHESS 2022). Atlantis Press. https://doi.org/10.2991/978-2-494069-89-3_247
Zhou, M., Wang, B., Liu, S., Li, M., Zhang, D., & Zhao, T. (2008). Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points. Proceedings of the 22nd International Conference on Computational Linguistics. Organizing Committee.
Zhu, J., Xia, Y., Wu, L., He, D., Qin, T., Zhou, W., Li1, H., & Liu, T. (2020). Incorporating BERT into Neural Machine Translation. Cornell University.
Downloads
Publicado
Como Citar
Edição
Seção
Licença
Copyright (c) 2025 Cadernos de Tradução

Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.
Autores mantêm os direitos autorais e concedem à revista o direito de primeira publicação, com o trabalho simultaneamente licenciado sob a Licença Creative Commons Atribuição 4.0 Internacional (CC BY) que permite o compartilhamento do trabalho com reconhecimento da autoria e publicação inicial nesta revista.
Autores têm autorização para assumir contratos adicionais separadamente, para distribuição não exclusiva da versão do trabalho publicada nesta revista (ex.: publicar em repositório institucional ou como capítulo de livro, com reconhecimento de autoria e publicação inicial nesta revista).