The role of metrics to assess the quality of British teenage language translation into Spanish and Italian using machine translation tools

Andrés Canga Alonso; Maria Cira Napoletano

doi:10.5007/2175-7968.2025.e101329

Autores/as

Andrés Canga Alonso Universidad de La Rioja https://orcid.org/0000-0002-1578-1626
Maria Cira Napoletano Universidad de La Rioja https://orcid.org/0009-0003-6644-8000

DOI:

https://doi.org/10.5007/2175-7968.2025.e101329

Palabras clave:

machine translation, teenage language, quality assessment, BLEU, METEOR

Resumen

The rapid evolution of adolescence language, characterized by slang and idiomatic expressions, presents a significant challenge for machine translation systems. Existing research has extensively covered the translation of languages in general; however, there remains a gap in understanding these systems’ ability when faced with adolescent language. This study aims at (i) the evaluation and the comparison of the accuracy of the translations of colloquial language by Bing Translator, DeepL and HelsinkiNLP from English into Spanish and Italian, (ii) the validity and reliability of two different metrics (i.e., BLEU, METEOR) to assess the accuracy and quality of MT tools with informal language, and (iii) the analysis of how specific features of teenage slang influence the ability of online tools to generate precise and comprehensible translations 1000-character excerpts from the Linguistic Innovators Corpus were translated in Spanish and Italian using DeepL, Bing Translator, and HelsinkiNLP and assessed using BLEU and METEOR metrics to verify their quality and reliability. Our findings show that teenage slang poses challenges for all tools, particularly with phrasal verbs and idioms. Our results also reveal that METEOR seems to be more reliable to assess British teenage language into Spanish and Italian.

Citas

Agarwal, A., & Lavie, A. (2008). METEOR, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output. Proceedings of the Third ACL Workshop on Statistical Machine Translation. Association for Computational Linguistics.

Alawi, N., & Abdulhaq, S. (2017). Machine Translation: The Cultural and Idiomatic Challenge. Journal of Al-Azhar University – Gaza (Humanities), 19(2), 1–28.

Banitz, B. (2020). Machine translation: A critical look at the performance of rule-based and statistical machine translation. Cadernos de Tradução, 40(1), 54–71. https://doi.org/10.5007/2175-7968.2020v40n1p54

Baziotis, C., Mathur, P., & Hasler, E. (2023). Automatic Evaluation and Analysis of Idioms in Neural Machine Translation. Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics (pp. 3649–3661). Association for Computational Linguistics.

Birdsell, B. J. (2022). Student writings with DeepL: Teacher evaluations and implications for teaching. In P. Ferguson & R. Derrah (Eds.), JALT2021: Reflections and New Perspectives (pp. 117-125). JALT. https://doi.org/10.37546/JALTPCP2021-14

Chatzikoumi, E. (2019). How to evaluate Machine Translation: A review of Automated and Human Metrics. Natural Language Engineering, 26(2), 137–161. https://doi.org/10.1017/S1351324919000469

Cheshire, J. (2007). Discourse Variation, Grammaticalisation and “Stuff like That”. Journal of Sociolinguistics, 11(2), 155–193. https://doi.org/10.1111/j.1467-9841.2007.00317.x

Costa, Â., Ling, W., Luís, T., Correia, R., & Coheur, L. (2015). A linguistically motivated taxonomy for Machine Translation error analysis. Machine Translation, 29(2), 127–161. http://dx.doi.org/10.1007/s10590-015-9169-0

Das, A. K. (2018). Translation and Artificial Intelligence: Where are we heading? International Journal of Translation, 30(1), 1–26.

Dorr, B., Snover, M., & Madnani, N. (2010). Chapter 5.1 Introduction. In B. Dorr (Ed.), Part 5: Machine Translation Evaluation (pp. 802–806). DARPA GALE Program Report.

Duan, G., Yang, H., Qin, K., & Huang, T. (2021). Improving Neural Machine Translation Model with Deep Encoding Information. Cognitive Computation, 13, 972–980. https://doi.org/10.1007/s12559-021-09860-7

Eckert, P. (2003). Language and adolescent peer groups. Journal of Language and Social Psychology, 22(1), 112-118. https://doi.org/10.1177/0261927X02250063

Gaspari, F., & Zacchetta, E. (2011). Scrittura controllata per la traduzione automatica. In G. Bersani Berselli (Ed.), Usare la Traduzione Automatica (pp. 63-79). Clueb.

Goto, I., & Tanaka, H. (2017). Detecting Untranslated Content for Neural Machine Translation. Proceedings of the First Workshop on Neural Machine Translation. Association for Computational Linguistics.

Hadla, L. S., Hailat, T. M., & Al-Kabi, M. N. (2015). Comparative Study Between METEOR and BLEU Methods of MT: Arabic into English Translation as a Case Study. International Journal of Advanced Computer Science and Applications (IJACSA), 6(11), 215–223. https://dx.doi.org/10.14569/IJACSA.2015.061128

He, L., Ghassemiazghandi, M., & Subramaniam, I. (2024). Comparative assessment of Bing Translator and Youdao Machine Translation Systems in English-to-Chinese literary text translation. Forum for Linguistic Studies. 6(2), 1–18. https://doi.org/10.59400/fls.v6i2.1189

Hutchins, J., & Somers, H. (1992). An Introduction to Machine Translation. Academic Press Limited.

Jibreel, I. (2023). Online Machine Translation Efficiency in Translating Fixed Expressions Between English and Arabic (Proverbs as a Case-in-Point). Theory and Practice in Language Studies, 13(5), 1148–1158. https://doi.org/10.17507/tpls.1305.07

Jufriadi, J., Asokawati, A., & Thayyib, M. (2022). The Error Analysis of Google Translate and Bing Translator in Translating Indonesian Folklore. FOSTER: Journal of English Language Teaching, 3(2), 69–79. https://doi.org/10.24256/foster-jelt.v3i2.89

Lavie, A., & Denkowski, M. (2009). The METEOR metric for automatic evaluation of Machine Translation. Machine Translation, 23, 105–115. https://doi.org/10.1007/s10590-009-9059-4

Lee, S., Lee, J., Moon, H., Park, C., Seo, J., Eo, S., Koo, S., & Lim, H. (2023). A Survey on Evaluation Metrics for Machine Translation. Mathematics, 11(4), 1–22. https://doi.org/10.3390/math11041006

Lotz, S., & Van Rensburg, A. (2016). Omission and other sins: Tracking the quality of online machine translation output over four years. Stellenbosch Papers in Linguistics, 46, 77–97. https://doi.org/10.5774/46-0-223

Mathur, N., Baldwin, T., & Cohn, T. (2020). Tangled up in BLEU: Reevaluating the Evaluation of Automatic Mahine Translation Evaluation Metrics. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics.

Mayor Martínez, A., Alegría Loinaz, I., Díaz de Ilarraza Sánchez, A., Labaka Intxauspe, G.,Lersundi Ayestaran, M., & Sarasola Gabiola, K. (2009). Evaluación de un sistema de traducción automática basado en reglas o por qué BLEU sólo sirve para lo que sirve. Procesamiento del Lenguaje Natural, 43, 197–205.

Moneus, A. M., & Sahari, Y. (2024). Artificial intelligence and human translation: A contrastive study based on legal texts. Heliyon, 10(6), 1–14. https://doi.org/10.1016/j.heliyon.2024.e28106

Napoletano, M. C., & Canga Alonso, A. (2023). The Translation of Adolescence Language by means of Apertium, Systran and Google Translate. Revista Electrónica de Lingüística Aplicada, 22(1), 148–163. http://dx.doi.org/10.58859/rael.v23i1.585

Nicholas, G., & Bhatia, A. (2023). Lost in translation: Large language models in non-English content analysis. Center for Democracy & Technology. https://doi.org/10.48550/arXiv.2306.07377

Nuraeni, F. W., & Pahamzah, J. (2021). An Analysis of Slang Language used in Teenager Interaction. Litera, 20, 313–322. http://dx.doi.org/10.21831/ltr.v20i2.37058

Orrego-Carmona, D. (2022). Machine translation in everyone’s hands – Adoption and changes among general users of MT. Revista Tradumàtica. Tecnologies de la Traducció, 20, 322–339. https://doi.org/10.5565/rev/tradumatica.324

Palacios Martínez, I. M. (2011). The language of British teenagers: A preliminary study of its main grammatical features. Atlantis, 33(1), 105–126.

Palacios Martínez, I. M. (2013). Non-standard negation in Modern English: A corpus-based study of four salient features. ES Review. Spanish Journal of English Studies, 34, 211–226.

Palacios Martínez, I. M. (2020). Taboo vocatives in the language of London teenagers. Pragmatics, 31(2), 250–277 https://doi.org/10.1075/prag.19028.pal

Palacios Martínez I. M. (2021). Recent changes in London English: An overview of the main lexical, grammar and discourse features of Multicultural London English (MLE). Complutense Journal of English Studies, 29, 1–20. https://doi.org/10.5209/cjes.77504

Peña Aguilar, A. (2023). Challenging machine translation engines: Some Spanish-English linguistic problems put to the test. Cadernos de Tradução, 43(1), 1–26. https://doi.org/10.5007/2175-7968.2023.e85397

Pimentel, C. H. M., & Pires, T. B. (2024). Treinamento e análise de um modelo de tradução automática baseado em Transformer. Texto Livre: inguagem e Tecnologia, 17, 1–15. https://doi.org/10.1590/1983-3652.2024.49118

Pym, A. (2020). Quality. In M. O’Hagan (Ed.), The Routledge Handbook of Translation and Technology (pp. 437–449). Routledge.

Rahm, E., & Do, H. H. (2000). Data cleaning: Problems and current approaches. IEEE Data Engineering Bulletin, 23(4), 3–13.

Roiss, S., & Zimmermann González, P. (2020). DeepL y su potencial para el desarrollo de la capacidad de análisis crítico en la clase de traducción inversa. Hermēneus. Revista de Traducción e Interpretación, 22, 363–382. https://doi.org/10.24197/her.22.2020.363-382

Rosyadi Za, D., Purnamawati, N., Galuh Dwi Ajeng, A. M., & Hejash, M. (2023). Slang as a Medium of Communication for Adolescents in Social Interaction between Others. JETA. Journal of English Teaching and Applied Linguistic, 4(1), 1–14. https://doi.org/10.52217/jeta.v4i1.1141

Russo, L., Loáiciga, S., & Gulati, A. (2012). Improving Machine Translationof null subjects in Italian and Spanish. In Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics (pp. 81–89). Association for Computational Linguistics.

Saadany, H., & Orasan, C. (2021). BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-Oriented Text. Proceedings of the Translation and Interpreting Technology Online Conference (pp. 48–56). INCOMA Ltd.

Sharma, S., Diwakar, M., Singh, P., Singh, V., Kadry, S., & Kim, J. (2023). Machine translation systems based on classical-statistical-deep learning approaches. Electronics, 12(7), 1–29. https://doi.org/10.3390/electronics12071716

Sharou, K. A., & Specia, L. (2022). A taxonomy and study of critical errors in Machine Translation. In H. Moniz, L. Macken, A. Rufener, L. Barrault, M. R. Costa-Jussà, C. Declercq, M. Koponen, E. Kemp, S. Pilos, M. L. Forcada, C. Scarton, J. Van den Bogaert, J. Daems, A. Tezcan, B. Vanroy & M. Fonteyne (Eds.), Proceedings of the 23rd Annual Conference of the European Association for Machine Translation. European Association for Machine Translation.

Silalahi, E., & Silalahi, N. (2023). Linguistics Realization Analysis on Slang Word; Social Media Whatsapp. JETAL. Journal of English Teaching & Applied Linguistic, 5, 8–13. http://dx.doi.org/10.36655/jetal.v5i1.1120

Smirnov, A. V., Teslya, N., Shilov, N., Frank, D., Minina, E., & Kovacs, M. (2022). Comparative Analysis of Neural Translation Models based on Transformers Architecture. Proceedings of the 24th International Conference on Enterprise Information Systems (ICEIS 2022) (pp. 586–593). https://doi.org/10.5220/0011083600003179

Son, J., & Kim, B-Y. (2023). Translation Performance from the User’s Perspective of Large Language Models and Neural Machine Translation Systems. Information 14(10), 1–18. https://doi.org/10.3390/info14100574

Song, R. (2022). Analysis on the Recent Trends in Machine Translation. Highlights in Science, Engineering and Technology, 16, 40–47. https://doi.org/10.54097/hset.v16i.2228

Tagliamonte, S. A., & Denis, D. (2010). The Stuff of Change: General Extenders in Toronto, Canada. Journal of English Linguistics, 38(4), 335–368. https://doi.org/10.1177/0075424210367484

Tavosanis, M. (2019). Valutazione umana di Google Traduttore e DeepL per le traduzioni di testi giornalistici dall’inglese verso l’italiano. In R. Bernardi, R. Navigli & G. Semeraro (Eds.), CLiC-it 2019. Proceedings of the Sixth Italian Conference on Computational Linguistics. CEUR.

Thiruumeni, P. G., Anand, K., Dhanalakshmi, V., & Soman, K. P. (2011). An approach to handle idioms and phrasal verbs in English-Tamil Machine Translation system. International Journal of Computer Applications, 26, 36–41. https://doi.org/10.5120/3139-4328

Tognini-Bonelli, E. (2001). Corpus linguistics at work. John Benjamins.

Torgersen, E. N., Gabrielatos, C., Hoffmann, S., & Fox, S. (2011). A corpus-based study of pragmatic markers in London English. Corpus Linguistics and Linguistic Theory, 7(1), 93–118. https://doi.org/10.1515/cllt.2011.005

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A., & Kaiser L. (2017). Attention is All You Need. Advances in Neural Information Processing Systems, 7, 1–15. https://doi.org/10.48550/arXiv.1706.03762

Vilar, D., Xu, J., D’Haro, L. F., & Ney, H. (2006). Error Analysis of Statistical Machine Translation Output. Proceedings of the Fifth International Conference on Language Resources and Evaluation. European Language Resources Association (ELRA).

Volk, M. (1998). The automatic translation of idioms. Machine translation vs. translation memory systems. In: N. Weber (Ed.), Machine translation: theory, applications, and evaluation. An assessment of the state of the art (pp. 167–192). Gardez-Verlag.

Wang, H., Wu, H., He, Z., Huang, L. B., & Church, K. W. (2021). Progress in Machine Translation. Engineering, 18, 143–153. https://doi.org/10.1016/j.eng.2021.03.023

Wang, Y. (2023). Research of types and current state of machine translation. Proceedings of the 2023 International Conference on Machine Learning and Automation. EWA Publishing. https://doi.org/10.54254/2755-2721/37/20230479

Zhao, Z. (2022). The Machine Translation Model. Proceedings of the 2022 5th International Conference on Humanities Education and Social Sciences (ICHESS 2022). Atlantis Press. https://doi.org/10.2991/978-2-494069-89-3_247

Zhou, M., Wang, B., Liu, S., Li, M., Zhang, D., & Zhao, T. (2008). Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points. Proceedings of the 22nd International Conference on Computational Linguistics. Organizing Committee.

Zhu, J., Xia, Y., Wu, L., He, D., Qin, T., Zhou, W., Li1, H., & Liu, T. (2020). Incorporating BERT into Neural Machine Translation. Cornell University.

The role of metrics to assess the quality of British teenage language translation into Spanish and Italian using machine translation tools

Autores/as

DOI:

Palabras clave:

Resumen

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Declaración de Derecho de Autor

Artículos similares

Idioma

Enviar un artículo

Indexadores

ISSN: 2175-7968