Generative AI in the extraction of archival metadata: a study based on the ISAD(G) standard

Authors

DOI:

https://doi.org/10.5007/1518-2924.2025.e103505

Keywords:

Archival Description, Automation, ISAD(G), Artificial intelligence, Machine Learning, ChatGPT

Abstract

Objective: This study aimed to analyze the use of artificial intelligence, specifically ChatGPT, in the description of digital records according to the General International Standard Archival Description (ISAD(G).

Methodology: The research is exploratory and applied in nature, suited for innovative studies where the objective is to explore possibilities, identify patterns, and formulate hypotheses for more detailed future research.

Results: The main results showed that ChatGPT achieved an average accuracy rate of 92.04% in the quantitative completion of metadata, with minimal variability between tests. However, inconsistencies were observed in metadata considered constant, such as Provenance and Access Conditions, which did not maintain the expected precision and consistency. Metadata like Extent and Medium, and Scope and Content showed greater difficulty in standardization, suggesting the need for improvements and adjustments to the model.

Conclusions: The findings suggest that while ChatGPT demonstrated efficiency in most of the analyzed fields, challenges remain with metadata that lack standardization. The results indicate that ChatGPT is capable of maintaining a high degree of metadata completeness but faces challenges regarding precision and consistency, especially in more complex fields. Adjustments to the model's training, along with ongoing human supervision, could enhance the quality of the descriptions generated. Despite its limitations, AI proves to be a promising tool, capable of driving significant advancements in the field of digital Archival Science.

Downloads

Download data is not yet available.

Author Biographies

Tatiana Canelhas Pignataro, Paulista State University

Co-founder of META Documentos Digitais, a company specializing in digital preservation. She is currently pursuing a Master's degree in Information Science (PPGCI/UNESP), Bachelor's degree in Archival Science and Computer Science from the University of Brasília. Contributed to the implementation of systems under Resolution No. 51 of August 25, 2023, by CONARQ (RDC-Arq), at the Federal Senate and the Superior Military Court, in partnership with UnB. At IBICT, she spent five years conducting research on digital preservation and contributed to the development of the Hipátia Model in projects with TJDFT, TRT4, TJMG, and the National Archives.

Manoel Pedro de Souza Neto, Paulista State University

Graduated in Library Science (2005) and Archives (2013), both from the Federal University of Amazonas (UFAM). He has a specialization in Archives from the Centro Universitário do Norte (UNINORTE), in 2006. Master in Cultural Heritage from the University of Santa Maria (UFSM), in 2016. Civil Servant at the Court of Justice of Amazonas (TJAM) since 2006. I was General Archive Manager of this court (2009-2015), returning to the position in (2018). He also works as secretary of the Permanent Commission for Document Evaluation (CPAD/TJAM), where he manages archival documents. Member of the LGPD Management Committee of the same Court. Appointed by Minister Gilmar Mendes, of the Federal Supreme Court and the National Council of Justice, to compose the Document Management and Memory Committee of the Judiciary (PRONAME), in 2009-2013.

José Carlos Abbud Grácio, Paulista State University

Bachelor's degree in Computer Science from the State University of Campinas-UNICAMP (1987), master's degree (2002) and doctorate (2011) in Information Science from the São Paulo State University Júlio de Mesquita Filho-UNESP/Marília. Director of Information Technology at UNESP/Marília, from 1995 to 2009. Member of the Higher Committee of Information Technology at UNESP from 2009 to 2016. President of the Permanent Commission for Digital Preservation at Unesp since 2018. Collaborating professor of the Postgraduate Program in Information Science at Unesp/Marília. Research in the area of ​​Information Science, with emphasis on Digital Preservation, Digital Preservation Policies and Plans, Archival Management Systems for Documents and Metadata. Member of the research group Dríade.

Telma Campanha de Carvalho Madio, Paulista State University

Graduated in History from the Pontifical Catholic University of São Paulo, specialized in Archives from IEB/USP, master's degree in History from the Pontifical Catholic University of São Paulo and PhD in Communication Sciences from the University of São Paulo. She is a professor of Photographic Documents at the Faculty of Philosophy and Sciences of the Júlio de Mesquita Filho State University of São Paulo/UNESP. She is currently an associate professor at UNESP, in the Department of Information Science of the Faculty of Philosophy and Sciences - Marília Campus, teaching undergraduate and graduate courses. Coordinator of the Conservation Laboratory since 2006. Full member of the Course and Departmental Councils, of the Document Assessment and Access Commission/CADA and of the Digital Preservation Commission of UNESP.

José Eduardo Santarem Segundo, University of São Paulo

Professor in Information and Technology from the University of São Paulo (USP), 2020. Post-Doctorate from the School of Computer Engineering at Western University/Canada, 2018. PhD and Master in Information Science from the São Paulo State University Júlio de Mesquita Filho-UNESP-Marília/SP; Professor in the Department of Education, Information and Communication, of the School of Philosophy, Sciences and Letters of Ribeirão Preto, University of São Paulo (USP); Professor of the Postgraduate Program in Information Science at UNESP/Marília. Research Productivity Scholarship PQ-2 from CNPq. Member of the Executive Board of the National Association for Research and Postgraduate Studies in Information Science (ANCIB). Works in the research line - Digital Environments and Technologies Applied to Information and Communication -, with emphasis on Semantic Web, Linked Data, Big Data, Machine Learning, Open Data and Digital Collections. Leader of NEWSDA - Center for Studies in Semantic Web and Open Data.

References

ARRUDA, H. M.; BAVARESCO, R. S.; KUNST, R.; BUGS, E. F.; PESENTI, G. C.; BARBOSA, J. L. V. Data Science Methods and Tools for Industry 4.0: A Systematic Literature Review and Taxonomy. Sensors, 2023, v. 23, n. 11, p. 5010. DOI: https://doi.org/10.3390/s23115010. Disponível em: https://www.mdpi.com/1424-8220/23/11/5010. Acesso em: 05 jun. 2025. DOI: https://doi.org/10.3390/s23115010

BRUCE, T. R.; HILLMANN, D. I. The continuum of metadata quality: Defining, expressing, exploiting. In: Metadata in Practice. ALA Editions, 2004. p. 238-256.

CHAKA, C. Generative AI Chatbots - ChatGPT versus YouChat versus Chatsonic: Use Cases of Selected Areas of Application. 2023. DOI: https://doi.org/10.26803/ijlter.22.6.1

CONSELHO INTERNACIONAL DE ARQUIVOS (CIA). ISAD(G): Norma geral internacional de descrição arquivística. 2. ed. Rio de Janeiro. 2000. Disponível em: https://www.gov.br/conarq/pt-br/centrais-de-conteudo/publicacoes/isad_g_2001.pdf. Acesso em: 14 jun. 2024.

CONSELHO NACIONAL DE ARQUIVOS (CONARQ). ISAAR (CPF): Norma Internacional sobre Registros de Autoridade Arquivística para Entidades Coletivas, Pessoas e Famílias. tradução de Vitor Manoel Marques da Fonseca. 2. ed., Rio de Janeiro: Arquivo Nacional, 2004. Disponível em: https://www.gov.br/conarq/pt-br/centrais-de-conteudo/publicacoes/isaar_cpf.pdf. Acesso em: 14 jun. 2024.

FRONTONI, E. Appearance-Based Archival Science. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024. p. 49-53. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.

JAIN, N.; TAYAL, A. PANDAS AI: A Step Towards GEN AI. International Journal of Scientific Research in Engineering and Management (IJSREM), v. 7, n. 7, p. 1-9, 2023. DOI: https://doi.org/10.55041/IJSREM24506

LEMIEUX, V. Balancing Act: Navigating the Nexus of AI, Privacy, and Accessibility in Archives. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024. p. 39-42. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.

LIKERT, R. A technique for the measurement of attitudes. Archives of Psychology, v. 22, n. 140, p. 1-55, 1932.

PACHECO, A.; SILVA, C. G. da; FREITAS, M. C. V de. A metadata model for authenticity in digital archival descriptions. Archival Science, v. 23, p. 629–673, 2023. DOI: https://doi.org/10.1007/s10502-023-09422-w. Disponível em: https://link.springer.com/article/10.1007/s10502-023-09422-w. Acesso em: 05 jun. 2025. DOI: https://doi.org/10.1007/s10502-023-09422-w

ROCKEMBACH, M. AI Literacy: A Must for Records Management and Archival Professionals. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024. p. 90-95. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.

SANTOS, V. B. Preservação de documentos arquivísticos digitais. Ciência da Informação, [s. l.], v. 41, n. 1, 2012. DOI: https://doi.org/10.18225/ci.inf.v41i1.1357. Disponível em: https://revista.ibict.br/ciinf/article/view/1357. Acesso em: 05 jun. 2025.

SANTAREM SEGUNDO, J. E. Disciplina “Data Science e Inteligência Artificial: um olhar pela Ciência da Informação”. [Slides da aula 02]. PPGCI/UNESP, 1º sem. 2024.

STANČIĆ, H.; TRBUŠIĆ, Z. Annotation of Digitised Archival Materials Supported by AI. In: DURANTI, L.; ROGERS, C. (Ed.). Artificial Intelligence and Documentary Heritage. SCEaR Newsletter 2024 - Special Issue 2024. Paris: UNESCO, 2024.p. 73-78. Disponível em: https://unesdoc.unesco.org/ark:/48223/pf0000389844. Acesso em: 05 jun. 2025.

ZHA, D.; BHAT, Z. P.; LAI, K.; YANG, F.; JIANG, Z.; ZHONG, S.; HU, X. Data-centric Artificial Intelligence: a survey. arXiv:2303.10158v3 [cs.LG]. 2023.

Published

2025-06-16

How to Cite

PIGNATARO, Tatiana Canelhas; SOUZA NETO, Manoel Pedro de; GRÁCIO, José Carlos Abbud; MADIO, Telma Campanha de Carvalho; SANTAREM SEGUNDO, José Eduardo. Generative AI in the extraction of archival metadata: a study based on the ISAD(G) standard. Encontros Bibli: electronic journal of library science, archival science and information science, Florianópolis/SC, Brasil, v. 30, p. 1–28, 2025. DOI: 10.5007/1518-2924.2025.e103505. Disponível em: https://periodicos.ufsc.br/index.php/eb/article/view/103505. Acesso em: 13 feb. 2026.

Issue

Section

Dossier: News scenarios of the Digital Society and the challenges of Generative Artificial Intelligence