Natural Language Processing in Information Metric Studies: an analysis of the articles indexed by the Web of Science (2000-2019)
DOI:
https://doi.org/10.5007/1518-2924.2021.e76886Keywords:
Natural Language Processing , Information Metric Studies, Social Network Analysis, Scientific Research, Mapping of scienceAbstract
Objective: To identify the international scientific structure of the research on the use of natural language processing in the information metric studies area.
Methods: It follows qualitative and quantitative approaches of the information metric studies and the knowledge organization domain. The data was retrieved on 02/02/2020 from the Web of Science Core Collection using the expression "natural language processing", limited to the document types articles and reviews, the category Information Science Library Science, and the timespan of the last 20 complete years (from 2000 to 2019). A Social Networks Analysis was conducted for the visualization of the scientific collaboration, co-citation, and keywords co-occurrence networks.
Results: Out of the 552 documents retrieved, 31 papers were identified in the information metric studies area. Bibliometric indicators of production, relationship, and impact were considered in the study and showed an increase of publications in the last three years, being 2018 the most productive year.
Conclusions: The international scientific literature on the application of NLP in information metric studies is emerging. Scientometrics was identified as the source that achieved a greatest impact. Finally, the k-core of the co-citation analysis shows the existence of an important theoretical core, often cited in the international academic community. The set of NLP techniques (e.., bag of words, tokenization, word stemming, part-of-speech tagging, and SVM) allows the researcher to go beyond the traditional citation analysis and focus on content and context of the citations.
Downloads
References
BERGMANN, I.; BUTZKE, D.; WALTER, L.; FUERSTE, J. P.; MOEHRLE, M. G.; ERDMANN, V. A. Evaluating the risk of patent infringement by means of semantic patent analysis: the case of DNA chips: Evaluating the risk of patent infringement. R&D Management, v. 38, n. 5, p. 550–562,2008. Disponível em: https://doi.org/10.1111/j.1467-9310.2008.00533.x Acesso em: 24 out. 2020.
BOYACK, K. W; KLAVANS, R. Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for information Science and Technology, v. 61, n.12, p 2389-2404, 2010. Disponível em: https://doi.org/10.1002/asi.21419 Acesso em: 24 out. 2020.
CASCINI, G.; ZINI, M. Measuring patent similarity by comparing inventions functional trees. Computer-Aided Innovation (CAI), v.277, p. 31–42, 2008.
CHEN, Ch.; IBEKWE‐SANJUAN, F.; HOU, J. The structure and dynamics of cocitation clusters: A multiple‐perspective cocitation analysis. Journal of the American Society for Information Science and Technology, v. 61, 7, p. 1386-1409, 2010. Disponível em: https://doi.org/10.1002/asi.21309 Acesso em: 24 out. 2020.
CHEN, B.; TSUTSUI, S.; DING, Y.; MA, F. Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval. Journal of Informetrics, vol. 11, n. 4, p. 1175–1189, 2017. Disponível em: https://doi.org/10.1016/j.joi.2017.10.00 Acesso: 24 out. 2020.
CHEN, X., DING, R., XU, K., WANG, S., HAO, T., & ZHOU, Y. A bibliometric review of natural language processing empowered mobile computing. Wireless Communications and Mobile Computing, v. 2018. Disponível em: https://doi.org/10.1155/2018/1827074
CHOWDHARY, K. R. Natural Language Processing. Em: CHOWDHARY, K. R. Fundamentals of Artificial Intelligence. New Delhi: Springer India, p. 603–649, 2020. Disponível em: http://doi.org/10.1007/978-81-322-3972-7_19 Acesso em: 02 fev. 2020.
CHOWDHURY, G. G. Natural language processing. Annual Review of Information Science and Technology, v. 37, n. 1, p. 51–89, 31 Jan. 2005. Disponível em: https://doi.org/10.1002/aris.1440370103 Acesso em: 02 fev. 2020.
COHAN, A.; GOHARIAN, N. Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, v. 19, n. 2–3, p. 287–303, Sep. 2018. Disponível em: https://doi.org/10.1007/s00799-017-0216-8. Acesso em: 02 fev. 2020.
CONROY, J.M.; DAVIS, S.T.Vector space and language models for scientific document summarization. Em: Proceedings of NAACL-HLT, p. 186–191, 2015.
DOLOREUX, D.; GAVIRIA DE LA PUERTA, J.; PASTOR-LÓPEZ, I.; PORTO GÓMEZ, I.; SANZ, B.; ZABALA-ITURRIAGAGOITIA, J. M. Territorial innovation models: to be or not to be, that’s the question. Scientometrics, v. 120, n. 3, p. 1163–1191, Sep. 2019. Disponível em: https://doi.org/10.1007/s11192-019-03181-1. Acesso em: 24 jun 2020.
FERREIRA, M. H. W.; CORRÊA, R. F. Estudo métrico temático sobre biblioteca digital no brasil: uma aplicação do software iramuteq. Encontro Brasileiro de Bibliometria e Cientometria, v. 6, p. 6º Encontro Brasileiro de Bibliometria e Cientometria, 2018. Disponível em: http://hdl.handle.net/20.500.11959/brapci/117376. Acesso em: 24 out. 2020.
GALVEZ C; MOYA-ANEGON, F. Standardizing formats of corporate source data. Scientometrics, v. 70 n.1, p. 3-26, 2007. Disponível em: 10.1007/s11192-007-0101-0 . Acesso em: 24 jun. 2020.
GARZONE, M.; MERCER, R. E. Towards an automated citation classifier. Em: Advances in Artificial Intelligence. p. 337-346, 2000.
GERKEN, J.; MOEHRLE, M.; WALTER L. Patents as an information source for product forecasting: Insights from a longitudinal study in the automotive industry. Em: The R&D management conference, v. 3, 2010. Disponível em: https://jmgerken.com/publication/gerken-2010-patents/ Acesso em: 24 out. 2020.
GHIASI, G.; LARIVIÈRE, V; SUGIMOTO, C. Gender differences in synchronous and diachronous self-citations. Em: 21st International Conference on Science and Technology Indicators-STI 2016. Book of Proceedings. 2016. Disponível em http://ocs.editorial.upv.es/index.php/STI2016/STI2016/paper/viewFile/4543/2327 Acesso em: 03 nov. 2020.
GLÄNZEL, W.; HEEFFER, S.; THIJS, B. Lexical analysis of scientific publications for nano-level scientometrics. Scientometrics, v. 111, n. 3, p. 1897–1906, Jun. 2017. Disponível em: https://doi.org/10.1007/s11192-017-2336-8. Acesso em: 02 fev. 2020.
HASSAN SU; IMRAN, M; IQBAL, S; ALJOHANI, NR; NAWAZ, R. Deep context of citations using machine-learning models in scholarly full-text articles. Scientometrics, v. 117, n.3, p.1645-62, 2018.
HJØRLAND, B. Domain analysis in information science: eleven approaches–traditional as well as innovative. Journal of documentation, v.58, n.4, p.422-462, 2002.
HJØRLAND, B. Domain analysis. Knowledge Organization, v.44, n. 6, p.436-464, 2017.
IQBAL, S.; HASSAN, S. U.; ALJOHANI, N. R.; ALELYANI, S.; NAWAZ, R.; BORNMANN, L. A Decade of In-text Citation Analysis based on Natural Language Processing and Machine Learning Techniques: An overview of empirical studies. 2020. arXiv preprint Disponí.vel em: https://arxiv.org/abs/2008.13020. Acesso em: 02 nov. 2020.
IOANNIDIS, J. P. A.; BAAS, J.; KLAVANS, J.; BOYACK, K. W. A standardized citation metrics author database annotated for the scientific field. PLOS Biology, v. 17, n. 8, e. 3000384, ago. 2019.Disponível em: https://doi.org/10.1371/journal.pbio.3000384 Acesso em: 06 nov. 2020.
KAMADA, T.; KAWAI, S. A general framework for visualizing abstract objects and relations. ACM Transactions on Graphics, Connecticut, v. 10, p. 1-39, 1991.
LADEIRA, A. P.; ALVARENGA, L. Processamento de linguagem natural: em busca de evidências temáticas nas publicações nacionais contemporâneas. In: Encontro Nacional de Pesquisa e Pós-Graduação em Ciência da Informação, 10, 2009, João Pessoa. Anais... João Pessoa: Ancib, 2009.
LI, L; MAO, L.; ZHANG, Y.; CHI, J.; HUANG, T.; CONG, X.; PENG, H. Computational linguistics literature and citations oriented citation linkage, classification and summarization. International Journal on Digital Libraries, v. 19, n. 2–3, p. 173–190, Sep. 2018. Disponível em: https://doi.org/10.1007/s00799-017-0219-5. Acesso em: 02 fev. 2020.
LI, X.; LEI, L. A bibliometric analysis of topic modelling studies (2000–2017). Journal of Information Science, p. 0165551519877049, 2019.
LIDDY, E. D. Natural language processing. p.1-15, 2001. Disponível em: https://surface.syr.edu/cgi/viewcontent.cgi?article=1019&context=cnlp Acesso em: 26 Jul. 2020.
LIDDY, E. D. Natural Language Processing for Information Retrieval. Em: BATES, M. J.; MAACK, M. N. (Eds.). Encyclopedia of Library and Information Sciences. CRC Press, 2010. Disponível em: https://doi.org/10.1081/E-ELIS3. Acesso em: 26 Jul. 2020.
LIU, Sh.; CHEN, Ch. The effects of co-citation proximity on co-citation analysis. Em: Proceedings of ISSI, p. 474-484. 2011.
LUPU, M. Information retrieval, machine learning, and Natural Language Processing for intellectual property information. World Patent Information, v. 49, p. A1–A3, 2017. Disponível em: https://doi.org/10.1016/j.wpi.2017.06.002 Acesso: 26 Jul. 2020.
MANNING, C. D., SURDEANU, M., BAUER, J., FINKEL, J., BETHARD, S. J., & MCCLOSKY, D. The Stanford CoreNLP natural language processing toolkit. Em: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, p. 55-60, 2014. Disponível em: https://www.aclweb.org/anthology/N15-3.pdf Acesso em: 26 Jul. 2020.
MOEHRLE, M. G; WALTER, L; GERITZ, A; MULLER, S. Patent-based inventor profiles as a basis for human resource decisions in research and development. R and D Management, v. 35, n. 5, p. 513–524, 2005. https://doi.org/10.1111/j.1467-9310.2005.00408.x. Acesso em: 26 Jul. 2020.
NADKARNI, P. M.; OHNO-MACHADO, L; CHAPMAN, W. W. Natural language processing: an introduction. Journal of the American Medical Informatics Association, v. 18, n. 5, p. 544-551, 2011.
PARK, H.; YOON, J; KIM, K. Identifying patent infringement using SAO based semantic technological similarities. Scientometrics, v.90, n.2, p. 515-529, 2012. Disponível em: https://doi.org/10.1007/s11192-011-0522-7 Acesso em: 2 dez. 2020.
PRINCETON UNIVERSITY. About WordNet. WordNet. Princeton University. 2010. Disponível em https://wordnet.princeton.edu/. Acesso em: 26 oct. 2020.
PUERTA-DIAZ, M.; MIRA, B. S.; OVALLE-PERANDONES, M.; GRÁCIO, M. C. C.; MARTÍNEZ-ÁVILA, D. O processamento de linguagem natural na área dos estudos métricos da informação: um estudo no período de 2000 a 2019. Anais do 7º Encontro Brasileiro de Bibliometria e Cientometria. Salvador: EDUFBA, 2020. p. 145-152. Disponível em: http://repositorio.ufba.br/ri/handle/ri/32385. Acesso em: 2 dez. 2020.
QAZVINIAN, V.; RADEV, D. R. Identifying non-explicit citing sentences for citation-based summarization. Em: Proceedings of the 48th annual meeting of the association for computational linguistics, p. 555-564, 2010.
R CORE TEAM. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. 2016. Disponível em: https://www.R-project.org/ Acesso: 24 out. 2020.
SAGGION, H.; ABURAED, A.; RONZANO, F. Trainable citation-enhanced summarization of scientific articles. Em: CABANAC, G; CHANDRASEKARAN, MK; FROMMHOLZ, I; JAIDKA, K; KAN, M; MAYR, P; WOLFRAM, D.(eds). Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL); 2016 Jun 23; Newark, United States.CEUR Workshop Proceedings, p. 175-86, 2016.
SEIDMAN, S. B. Network structure and minimum degree. Social networks, v.5 n.3, p. 269-287, 1983.
SZOMSZOR M; PENDLEBURY DA; ADAMS J. How much is too much? The difference between research influence and self-citation excess. Scientometrics, v.123, n.2, p. 1119-1147, 2020.
SMEATON, A. F. Using NLP or NLP Resources for Information Retrieval Tasks. In: STRZALKOWSKI, T. (ed.). Natural Language Information Retrieval. Dordrecht: Springer Netherlands, 1999. v. 7, p. 99–111. Disponível em: http://link.springer.com/10.1007/978-94-017-2388-6_4. Acesso em: 26 Jul. 2020.
SMIRAGLIA, R. Domain analysis for knowledge organization: tools for ontology extraction. Chandos Publishing, p. 116, 2015.
TASKIN, Z.; AL, U. Natural language processing applications in library and information science. Online Information Review, v. 43, n. 4, p. 676–690, 12 Aug. 2019. Disponível em: https://doi.org/10.1108/OIR-07-2018-0217. Acesso em: 26 Jul. 2020.
TSOURIKOV, V. M.; BATCHILO, L. S.; SOVPEL, I. V. Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures. United States Patent No. 6167370. 2000
VAN ECK, N. J.; WALTMAN, L. VOSviewer manual. Leiden: Univeristeit Leiden, v. 1, n. 1, p. 1-53, 2020.
WHITE, H. D. Authors as Citers over Time. Journal of the American Society for Information Science and Technology, v. 52, n. 2, p .87–108, 2001.
YOON, J.; CHOI, S.; KIM, K. Invention property-function network analysis of patents: a case of silicon-based thin film solar cells. Scientometrics, v. 86, n. 3, p. 687–703, 2011. Disponível em: https://doi.org/10.1007/s11192-010-0303-8. Acesso em: 26 Jul. 2020.
YOON J.; KIM K. Identifying rapidly evolving technological trends for R&D planning using SAO-based semantic patent networks. Scientometrics, v.88 n.1, p.213-28, 2011. Acesso em: 26 Jul. 2020.
YOON J; PARK H; KIM K. Identifying technological competition trends for R&D planning using dynamic patent maps: SAO-based content analysis. Scientometrics, v.94, n.1, p.313-31, 2013. Disponível em: http://doi.org/10.1007/s11192-012-0830-6 Acesso em: 26 Jul. 2020.
YUE, H. Core and visualization analysis based on network of co-citation. Em: 2010 2nd IEEE International Conference on Information Management and Engineering. IEEE, p. 266-269, 2010. Disponível em: http://doi.org/10.1109/ICIME.2010.5478291. Acesso em: 26 Jul. 2020.
ZHU XD; TURNEY P; LEMIRE D; VELLINO A. Measuring Academic Influence: Not All Citations Are Equal. Journal of the Association for Information Science and Technology, v.66, n.2, p.408-27, 2015.Disponível em: http://doi.org/10.1002/asi.23179 Acesso em: 26 Jul. 2020.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Mirelys Puerta-Díaz
This work is licensed under a Creative Commons Attribution 4.0 International License.
The author must guarantee that:
- there is full consensus among all the coauthors in approving the final version of the document and its submission for publication.
- the work is original, and when the work and/or words from other people were used, they were properly acknowledged.
Plagiarism in all of its forms constitutes an unethical publication behavior and is unacceptable. Encontros Bibli has the right to use software or any other method of plagiarism detection.
All manuscripts submitted to Encontros Bibli go through plagiarism and self-plagiarism identification. Plagiarism identified during the evaluation process will result in the filing of the submission. In case plagiarism is identified in a manuscript published in the journal, the Editor-in-Chief will conduct a preliminary investigation and, if necessary, will make a retraction.
This journal, following the recommendations of the Open Source movement, provides full open access to its content. By doing this, the authors keep all of their rights allowing Encontros Bibli to publish and make its articles available to the whole community.
Encontros Bibli content is licensed under a Creative Commons Attribution 4.0 International License.
Any user has the right to:
- Share - copy, download, print or redistribute the material in any medium or format.
- Adapt - remix, transform and build upon the material for any purpose, even commercially.
According to the following terms:
- Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything that the license permits.