Chinese-Portuguese legal bilingualism in Macao: AI-powered corpus analysis and alignment
DOI:
https://doi.org/10.5007/2175-7968.2025.e108423Keywords:
Legal bilingualism, Macau, Legal translation, Corpus linguistics, Linguistic calquesAbstract
This article explores legal bilingualism in Macau, a Special Administrative Region of China that adopts the “one country, two systems” principle. The analysis focuses on the coexistence of Chinese and Portuguese in local legislation, highlighting linguistic hierarchy and legal translation challenges. Based on a robust methodology, the study compiles and annotates parallel corpora of legal texts in Portuguese and Chinese, employing advanced artificial intelligence tools such as Bertalign for automated alignment and spaCy for linguistic annotations. Due to the distinct writing systems between the languages, alignment required AI-based solutions to overcome limitations of conventional aligners developed for Romance languages. The analysis highlights the significant influence of Portuguese on Chinese legal terminology, with a high prevalence of linguistic calques. The study also addresses the challenges of Chinese language tokenization and machine translation, proposing practical solutions and comparing the performance of different translation tools, such as LLMs (language models) and NMTs (neural machine translation systems). The detailed exploration of the corpora, including n-gram analyses and syntactic patterns, offers valuable insights for corpus linguistics and legal translation. The computational tools of Sketch Engine enable, through practical examples extracted from this aligned corpus, the study of specific aspects of Macanese legal translation. The article contributes significantly to the field of corpus linguistics, especially in multilingual legal contexts, and offers valuable methodological and analytical resources for researchers and legal translation professionals.
References
Anthony, L. (2024a). Antconc (Version 4.0) [Software].
Anthony, L. (2024b). AntPconc (Version 1.2.1) [Software].
Anthony, L. (2024c). TagAnt [Software]
bfsujason. (2022). Bertalign (versão 0.1.0) [Software]. GitHub.
Boletim Oficial (B.O.) de Macau. (1988). Declaração Conjunta do Governo da República Portuguesa e do Governo da República Popular da China sobre a Questão de Macau [com Anexos I e II]. https://bo.io.gov.mo
Boletim Oficial (B.O.) de Macau. (1993). Lei Básica da Região Administrativa Especial de Macau da República Popular da China, de 31 de março de 1993, Promulgada pelo Decreto n.º 3 do Presidente da República Popular da China. https://bo.io.gov.mo
Cheng, L., & Sun, Y. (2021). Terminology translation in socio-legal contexts: A corpus-based exploration. In S. Li & W. Hope (Eds.), Terminology Translation in Chinese Contexts: Theory and Practice (pp. 27–39). Routledge.
Claude AI. (2024). Claude AI [Software].
Deepseek. (2023). Deepseek [Software].
Gao, Z.-M. (2021). Automatically compiling bilingual legal glossaries based on Chinese-English parallel corpora. In S. Li & W. Hope (Eds.), Terminology Translation in Chinese Contexts: Theory and Practice (p. 164–179). Routledge. http://doi.org/10.4324/9781003006688-14
Imprensa Oficial (IO) do Governo da Região Administrativa Especial de Macau. (2025). Página inicial. io.gov.mo.
Lefer, M.-A. (2020). Parallel corpora. In M. Paquot & S. T. Gries (Eds.), A Practical Handbook of Corpus Linguistics (p. 257–282). Springer Nature.
Leong, S. M. (2012). Divergências linguísticas e interpretação correcta da Lei Básica. Revista de Estudos de “Um País, Dois Sistemas”, 4, 183–193.
Liu, L., & Zhu, M. (2022). Bertalign: Improved word embedding-based sentence alignment for Chinese–English parallel corpora of literary texts. Digital Scholarship in the Humanities, 38(2), 621–634. https://doi.org/10.1093/llc/fqac089
Miroir, J.-C. (2024a). Compilação e exploração de material de apoio à tradução de textos jurídicos normativos: o caso da versão do português para o francês (AntPconc). In F. C. C. L. Arraes, A. R. de Oliveira Harden & C. Roscoe-Bessa (Eds.), Tradução em contextos específicos: conhecimentos e habilidades (pp. 13–49). Pontes Editores.
Miroir, J.-C. (2024b). Processamento de linguagem natural multilíngue com spaCy e análises avançadas de corpora anotados com Antconc (versão 4). In Encontro de Linguística de Corpus & Escola Brasileira de Linguística Computacional ELC/EBRALC, Universidade de Brasília, 21–24 de Outubro, 2024. [Workshop handout]. Departamento de Línguas Estrangeiras e Tradução, Instituto de Letras, Universidade de Brasília. https://doi.org/10.13140/RG.2.2.24082.67520
Miroir, J.-C. (2025). Tradução jurídica em contexto (TraJeC): Bilinguismo jurídico chinês-português em Macau [Data set]. Figshare Datacite. https://figshare.com/projects/TraJeC
Newman, J., & Cox, C. (2020). Corpus Annotation. In M. Paquot & S. T. Gries (Eds.), A Practical Handbook of Corpus Linguistics (pp. 24–48). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-46216-1_2
Paquot, M., & Gries, S. T. (Eds.). (2020). A Practical Handbook of Corpus Linguistics. Springer Nature Switzerland AG.
Sardinha, T. B. (2000). Lingüística de Corpus: histórico e problemática. DELTA, 16(2). https://doi.org/10.1590/S0102-44502000000200005
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Cadernos de Tradução

This work is licensed under a Creative Commons Attribution 4.0 International License.
Copyright Notice
Authors hold the copyright and grant the journal the right for their articles' first publication, being their works simultaneously licensed under the Creative Commons Attribution License (CC BY), which allows the sharing of such works with its authorship acknowledged and its initial publication in this journal.
Authors are allowed to enter into separate additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or as a book chapter, with an acknowledgment of its initial publication in this journal).


















































