Chinese-Portuguese legal bilingualism in Macao: AI-powered corpus analysis and alignment

Authors

DOI:

https://doi.org/10.5007/2175-7968.2025.e108423

Keywords:

Legal bilingualism, Macau, Legal translation, Corpus linguistics, Linguistic calques

Abstract

This article explores legal bilingualism in Macau, a Special Administrative Region of China that adopts the “one country, two systems” principle. The analysis focuses on the coexistence of Chinese and Portuguese in local legislation, highlighting linguistic hierarchy and legal translation challenges. Based on a robust methodology, the study compiles and annotates parallel corpora of legal texts in Portuguese and Chinese, employing advanced artificial intelligence tools such as Bertalign for automated alignment and spaCy for linguistic annotations. Due to the distinct writing systems between the languages, alignment required AI-based solutions to overcome limitations of conventional aligners developed for Romance languages. The analysis highlights the significant influence of Portuguese on Chinese legal terminology, with a high prevalence of linguistic calques. The study also addresses the challenges of Chinese language tokenization and machine translation, proposing practical solutions and comparing the performance of different translation tools, such as LLMs (language models) and NMTs (neural machine translation systems). The detailed exploration of the corpora, including n-gram analyses and syntactic patterns, offers valuable insights for corpus linguistics and legal translation. The computational tools of Sketch Engine enable, through practical examples extracted from this aligned corpus, the study of specific aspects of Macanese legal translation. The article contributes significantly to the field of corpus linguistics, especially in multilingual legal contexts, and offers valuable methodological and analytical resources for researchers and legal translation professionals.

References

Anthony, L. (2024a). Antconc (Version 4.0) [Software].

Anthony, L. (2024b). AntPconc (Version 1.2.1) [Software].

Anthony, L. (2024c). TagAnt [Software]

bfsujason. (2022). Bertalign (versão 0.1.0) [Software]. GitHub.

Boletim Oficial (B.O.) de Macau. (1988). Declaração Conjunta do Governo da República Portuguesa e do Governo da República Popular da China sobre a Questão de Macau [com Anexos I e II]. https://bo.io.gov.mo

Boletim Oficial (B.O.) de Macau. (1993). Lei Básica da Região Administrativa Especial de Macau da República Popular da China, de 31 de março de 1993, Promulgada pelo Decreto n.º 3 do Presidente da República Popular da China. https://bo.io.gov.mo

Cheng, L., & Sun, Y. (2021). Terminology translation in socio-legal contexts: A corpus-based exploration. In S. Li & W. Hope (Eds.), Terminology Translation in Chinese Contexts: Theory and Practice (pp. 27–39). Routledge.

Claude AI. (2024). Claude AI [Software].

Deepseek. (2023). Deepseek [Software].

Gao, Z.-M. (2021). Automatically compiling bilingual legal glossaries based on Chinese-English parallel corpora. In S. Li & W. Hope (Eds.), Terminology Translation in Chinese Contexts: Theory and Practice (p. 164–179). Routledge. http://doi.org/10.4324/9781003006688-14

Imprensa Oficial (IO) do Governo da Região Administrativa Especial de Macau. (2025). Página inicial. io.gov.mo.

Lefer, M.-A. (2020). Parallel corpora. In M. Paquot & S. T. Gries (Eds.), A Practical Handbook of Corpus Linguistics (p. 257–282). Springer Nature.

Leong, S. M. (2012). Divergências linguísticas e interpretação correcta da Lei Básica. Revista de Estudos de “Um País, Dois Sistemas”, 4, 183–193.

Liu, L., & Zhu, M. (2022). Bertalign: Improved word embedding-based sentence alignment for Chinese–English parallel corpora of literary texts. Digital Scholarship in the Humanities, 38(2), 621–634. https://doi.org/10.1093/llc/fqac089

Miroir, J.-C. (2024a). Compilação e exploração de material de apoio à tradução de textos jurídicos normativos: o caso da versão do português para o francês (AntPconc). In F. C. C. L. Arraes, A. R. de Oliveira Harden & C. Roscoe-Bessa (Eds.), Tradução em contextos específicos: conhecimentos e habilidades (pp. 13–49). Pontes Editores.

Miroir, J.-C. (2024b). Processamento de linguagem natural multilíngue com spaCy e análises avançadas de corpora anotados com Antconc (versão 4). In Encontro de Linguística de Corpus & Escola Brasileira de Linguística Computacional ELC/EBRALC, Universidade de Brasília, 21–24 de Outubro, 2024. [Workshop handout]. Departamento de Línguas Estrangeiras e Tradução, Instituto de Letras, Universidade de Brasília. https://doi.org/10.13140/RG.2.2.24082.67520

Miroir, J.-C. (2025). Tradução jurídica em contexto (TraJeC): Bilinguismo jurídico chinês-português em Macau [Data set]. Figshare Datacite. https://figshare.com/projects/TraJeC

Newman, J., & Cox, C. (2020). Corpus Annotation. In M. Paquot & S. T. Gries (Eds.), A Practical Handbook of Corpus Linguistics (pp. 24–48). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-46216-1_2

Paquot, M., & Gries, S. T. (Eds.). (2020). A Practical Handbook of Corpus Linguistics. Springer Nature Switzerland AG.

Sardinha, T. B. (2000). Lingüística de Corpus: histórico e problemática. DELTA, 16(2). https://doi.org/10.1590/S0102-44502000000200005

Published

2025-09-30

How to Cite

Miroir, J.-C. (2025). Chinese-Portuguese legal bilingualism in Macao: AI-powered corpus analysis and alignment. Cadernos De Tradução, 45(esp. 3), 1–30. https://doi.org/10.5007/2175-7968.2025.e108423

Similar Articles

1 2 3 4 5 6 7 8 9 10 > >> 

You may also start an advanced similarity search for this article.