Generative AI in the extraction of archival metadata: a study based on the ISAD(G) standard
DOI:
https://doi.org/10.5007/1518-2924.2025.e103505Keywords:
Archival Description, Automation, ISAD(G), Artificial intelligence, Machine Learning, ChatGPTAbstract
Objective: This study aimed to analyze the use of artificial intelligence, specifically ChatGPT, in the description of digital records according to the General International Standard Archival Description (ISAD(G).
Methodology: The research is exploratory and applied in nature, suited for innovative studies where the objective is to explore possibilities, identify patterns, and formulate hypotheses for more detailed future research.
Results: The main results showed that ChatGPT achieved an average accuracy rate of 92.04% in the quantitative completion of metadata, with minimal variability between tests. However, inconsistencies were observed in metadata considered constant, such as Provenance and Access Conditions, which did not maintain the expected precision and consistency. Metadata like Extent and Medium, and Scope and Content showed greater difficulty in standardization, suggesting the need for improvements and adjustments to the model.
Conclusions: The findings suggest that while ChatGPT demonstrated efficiency in most of the analyzed fields, challenges remain with metadata that lack standardization. The results indicate that ChatGPT is capable of maintaining a high degree of metadata completeness but faces challenges regarding precision and consistency, especially in more complex fields. Adjustments to the model's training, along with ongoing human supervision, could enhance the quality of the descriptions generated. Despite its limitations, AI proves to be a promising tool, capable of driving significant advancements in the field of digital Archival Science.
Downloads
References
ARRUDA, H. M.; BAVARESCO, R. S.; KUNST, R.; BUGS, E. F.; PESENTI, G. C.; BARBOSA, J. L. V. Data Science Methods and Tools for Industry 4.0: A Systematic Literature Review and Taxonomy. Sensors, 2023, v. 23, n. 11, p. 5010. DOI: https://doi.org/10.3390/s23115010. Disponível em: https://www.mdpi.com/1424-8220/23/11/5010. Acesso em: 05 jun. 2025. DOI: https://doi.org/10.3390/s23115010
BRUCE, T. R.; HILLMANN, D. I. The continuum of metadata quality: Defining, expressing, exploiting. In: Metadata in Practice. ALA Editions, 2004. p. 238-256.
CHAKA, C. Generative AI Chatbots - ChatGPT versus YouChat versus Chatsonic: Use Cases of Selected Areas of Application. 2023. DOI: https://doi.org/10.26803/ijlter.22.6.1
CONSELHO INTERNACIONAL DE ARQUIVOS (CIA). ISAD(G): Norma geral internacional de descrição arquivística. 2. ed. Rio de Janeiro. 2000. Disponível em: https://www.gov.br/conarq/pt-br/centrais-de-conteudo/publicacoes/isad_g_2001.pdf. Acesso em: 14 jun. 2024.
CONSELHO NACIONAL DE ARQUIVOS (CONARQ). ISAAR (CPF): Norma Internacional sobre Registros de Autoridade Arquivística para Entidades Coletivas, Pessoas e Famílias. tradução de Vitor Manoel Marques da Fonseca. 2. ed., Rio de Janeiro: Arquivo Nacional, 2004. Disponível em: https://www.gov.br/conarq/pt-br/centrais-de-conteudo/publicacoes/isaar_cpf.pdf. Acesso em: 14 jun. 2024.
FRONTONI, E. Appearance-Based Archival Science. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024. p. 49-53. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.
JAIN, N.; TAYAL, A. PANDAS AI: A Step Towards GEN AI. International Journal of Scientific Research in Engineering and Management (IJSREM), v. 7, n. 7, p. 1-9, 2023. DOI: https://doi.org/10.55041/IJSREM24506
LEMIEUX, V. Balancing Act: Navigating the Nexus of AI, Privacy, and Accessibility in Archives. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024. p. 39-42. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.
LIKERT, R. A technique for the measurement of attitudes. Archives of Psychology, v. 22, n. 140, p. 1-55, 1932.
PACHECO, A.; SILVA, C. G. da; FREITAS, M. C. V de. A metadata model for authenticity in digital archival descriptions. Archival Science, v. 23, p. 629–673, 2023. DOI: https://doi.org/10.1007/s10502-023-09422-w. Disponível em: https://link.springer.com/article/10.1007/s10502-023-09422-w. Acesso em: 05 jun. 2025. DOI: https://doi.org/10.1007/s10502-023-09422-w
ROCKEMBACH, M. AI Literacy: A Must for Records Management and Archival Professionals. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024. p. 90-95. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.
SANTOS, V. B. Preservação de documentos arquivísticos digitais. Ciência da Informação, [s. l.], v. 41, n. 1, 2012. DOI: https://doi.org/10.18225/ci.inf.v41i1.1357. Disponível em: https://revista.ibict.br/ciinf/article/view/1357. Acesso em: 05 jun. 2025.
SANTAREM SEGUNDO, J. E. Disciplina “Data Science e Inteligência Artificial: um olhar pela Ciência da Informação”. [Slides da aula 02]. PPGCI/UNESP, 1º sem. 2024.
STANČIĆ, H.; TRBUŠIĆ, Z. Annotation of Digitised Archival Materials Supported by AI. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024.p. 73-78. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.
ZHA, D.; BHAT, Z. P.; LAI, K.; YANG, F.; JIANG, Z.; ZHONG, S.; HU, X. Data-centric Artificial Intelligence: a survey. arXiv:2303.10158v3 [cs.LG]. 2023.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Tatiana Canelhas Pignataro, Manoel Pedro de Souza Neto, José Carlos Abbud Grácio, Telma Campanha de Carvalho Madio, José Eduardo Santarem Segundo

This work is licensed under a Creative Commons Attribution 4.0 International License.
The author must guarantee that:
- there is full consensus among all the coauthors in approving the final version of the document and its submission for publication.
- the work is original, and when the work and/or words from other people were used, they were properly acknowledged.
Plagiarism in all of its forms constitutes an unethical publication behavior and is unacceptable. Encontros Bibli has the right to use software or any other method of plagiarism detection.
All manuscripts submitted to Encontros Bibli go through plagiarism and self-plagiarism identification. Plagiarism identified during the evaluation process will result in the filing of the submission. In case plagiarism is identified in a manuscript published in the journal, the Editor-in-Chief will conduct a preliminary investigation and, if necessary, will make a retraction.
This journal, following the recommendations of the Open Source movement, provides full open access to its content. By doing this, the authors keep all of their rights allowing Encontros Bibli to publish and make its articles available to the whole community.
Encontros Bibli content is licensed under a Creative Commons Attribution 4.0 International License.
Any user has the right to:
- Share - copy, download, print or redistribute the material in any medium or format.
- Adapt - remix, transform and build upon the material for any purpose, even commercially.
According to the following terms:
- Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything that the license permits.


















