Análise da extração de descritores como sintagmas nominais através do software OGMA

Renato Fernandes Corrêa; Luiz Henrique Teixeira Bazílio

doi:10.5007/1518-2924.2017v22n50p44

Analysis of extraction of descriptors as noun phrases through the OGMA software

Authors

Renato Fernandes Corrêa Universidade Federal de Pernambuco http://orcid.org/0000-0002-9880-8678
Luiz Henrique Teixeira Bazílio Universidade Federal de Pernambuco.

DOI:

https://doi.org/10.5007/1518-2924.2017v22n50p44

Keywords:

Automatic indexing, Noun Phrases, Keywords, Theses and dissertations, OGMA software

Abstract

This work investigates automatic indexing by noun phrases of documents containing title and abstract of 30 theses and dissertations written in Portuguese and of three different areas of knowledge. The research method is exploratory and based on literature review and an experiment. The experiment consisted of the OGMA software output analysis for the document corpus and the measurement of the level of recall of keywords present in the documents. It shows a descriptive profile of the sequences of grammatical labels for keywords present extracted and not extracted as noun phrases. It is concluded that 68% of the totality of keywords informed by the authors were in the title or abstract of the thesis or dissertations, of these 66% were extracted as noun phrases, which corresponds to the recall level of keywords present reached by OGMA software. Keywords present and not extracted had mainly nouns or adjectives labeled with incorrect grammatical category by the software. Keywords present and extracted were mostly single nouns (30%), noun-adjective pair (28%) and noun-preposition-noun trigram (19%). The OGMA obtained a good level of recall of keywords present, and this level can increases in almost 34% with adjustments in the part-of-speech tagger.

Downloads

Download data is not yet available.

Author Biographies

Renato Fernandes Corrêa, Universidade Federal de Pernambuco

Doutor em Ciência da Computação, docente do Programa de Pós-Graduação em Ciência da Informação e do Departamento de Ciência da Informação da Universidade Federal de Pernambuco.

Luiz Henrique Teixeira Bazílio, Universidade Federal de Pernambuco.

Graduando em Gestão da Informação pela Universidade Federal de Pernambuco.

References

CORRÊA, Renato Fernandes; LAPA, Remi Correa; Panorama de estudos sobre indexação automática no âmbito da ciência da informação no Brasil (1973-2012). Ciência da Informação, Brasília, DF, v. 42 n. 2, p.255-273, maio/ago., 2013.

CORRÊA, Renato Fernandes; MIRANDA, Darliane Goes de; LIMA, Camila Oliveira de Almeida; SILVA, Tiago José da. Indexação e recuperação de teses e dissertações por meio de sintagmas nominais. AtoZ: Novas Práticas em Informação e Conhecimento, Curitiba, v. 1, n. 1, p. 11-22, 2011.

KURAMOTO, Hélio. Sintagmas nominais: uma nova proposta para a recuperação de informação. DataGramaZero, Rio de Janeiro, v. 3, n. 1, fev. 2002.

MAIA, Luiz Cláudio Gomes. Uso de sintagmas nominais na classificação automática de documentos eletrônicos. 2008. Tese (Doutorado em Ciência da Informação) – Universidade Federal de Minas Gerais – UFMG. Belo Horizonte, 2008.

MAIA, Luiz Cláudio Gomes; SOUZA, Renato Rocha. Uso de sintagmas nominais na classificação automática de documentos eletrônicos. Perspectivas em Ciência da Informação, v. 15, n. 1, jan./abr., 2010.

ROBREDO, Jaime. Otimização dos processos de indexação dos documentos e de recuperação da informação mediante o uso de instrumentos de controle terminológico. Ciência da Informação. Inf., Brasília, v. 11, n. 1, 1982.

SILVA, Tiago José da. Indexação automática por meio da extração e seleção de sintagmas nominais em textos em língua portuguesa. 2014, 144 f. Dissertação (Mestrado) – Mestrado em Ciência da Informação, Universidade Federal de Pernambuco, Recife-PE, 2014.

SILVA, T. J. da; CORREA, R. F. Ferramentas Para Indexação Automática: uma análise comparativa entre o OGMA, Parser PALAVRAS, LX-Parser e a extração manual de sintagmas nominais. In.: XVI Encontro Nacional de Pesquisa em Pós-Graduação em Ciência da Informação, 2015, João Pessoa. Anais do XVI Encontro Nacional de Pesquisa em Pós-Graduação em Ciência da Informação. João Pessoa: PPGCI/UFPB, 2015. p. 1-20.

SOUZA, Renato Rocha; RAGHAVAN, K. S. A extração de palavras-chave a partir de textos: um estudo exploratório utilizando sintagmas. Informação & Tecnologia (ITEC): Marília / João Pessoa, 1 (1):5-16, jan/jun, 2014.

SOUZA, R. R. Uma proposta de metodologia para indexação automática utilizando sintagmas nominais. Encontros Bibli: Revista Eletrônica de Biblioteconomia e Ciência da Informação, v. 11, n. esp., p. 42-59, 2006.

Downloads

Published

2017-09-06

How to Cite

CORRÊA, Renato Fernandes; BAZÍLIO, Luiz Henrique Teixeira. Analysis of extraction of descriptors as noun phrases through the OGMA software. Encontros Bibli: revista eletrônica de biblioteconomia e ciência da informação, [S. l.], v. 22, n. 50, p. 44–58, 2017. DOI: 10.5007/1518-2924.2017v22n50p44. Disponível em: https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n50p44. Acesso em: 20 may. 2024.

Download Citation

Issue

Vol. 22 No. 50 (2017): Data de publicação: 01/09/2017

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

The author must guarantee that:

there is full consensus among all the coauthors in approving the final version of the document and its submission for publication.
the work is original, and when the work and/or words from other people were used, they were properly acknowledged.

Plagiarism in all of its forms constitutes an unethical publication behavior and is unacceptable. Encontros Bibli has the right to use software or any other method of plagiarism detection.

All manuscripts submitted to Encontros Bibli go through plagiarism and self-plagiarism identification. Plagiarism identified during the evaluation process will result in the filing of the submission. In case plagiarism is identified in a manuscript published in the journal, the Editor-in-Chief will conduct a preliminary investigation and, if necessary, will make a retraction.

This journal, following the recommendations of the Open Source movement, provides full open access to its content. By doing this, the authors keep all of their rights allowing Encontros Bibli to publish and make its articles available to the whole community.

Encontros Bibli content is licensed under a Creative Commons Attribution 4.0 International License.

Any user has the right to:

Share - copy, download, print or redistribute the material in any medium or format.
Adapt - remix, transform and build upon the material for any purpose, even commercially.

According to the following terms:

Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything that the license permits.