Data analysis on articles retrieved from Web of Science (WOS)
DOI:
https://doi.org/10.5007/1518-2924.2018v23nespp112Keywords:
Information Retrieval, Knowledge Discovery in Databases, Text MiningAbstract
In Data mining and Text mining context, the goal is to analyze data retrieved from Web of Science (WoS). This paper intends to identify patterns in Text mining researches on selection of tools to be used on datamining application. References in BibTeX format were retrieved from articles existing in WoS platform. An application imported data from BibTeX to a MySQL database. The found characteristics led to choose the R programming language and the Apriori algorithm on a subset of data. Data about tools, methods, keywords, indexing terms, journals, countries, and authors were identified in records. Apriori resulted on thirteen association rules. The exploration of data from WoS articles revealed characteristics of Data mining researches. Future works can adapt the application used on this study and use other datamining methods on the dataset.
Downloads
References
AMO, S. de. Técnicas de mineração de dados. s.l.: Universidade Federal de Uberlândia, s.d. Disponível em: <https://www.researchgate.net/profile/Sandra_Amo/publication/260300816_Tcnicas_de_Minerao_de_Dados/links/54230bd80cf290c9e3ae25e3.pdf>. Acesso em: 26 jan. 2016.
CARVALHO, L. A. V. de. Datamining: a mineração de dados no marketing, medicina, economia, engenharia e administração. Rio de Janeiro: Ciência Moderna, 2005.
CASTRO, L. N. de; FERRARI, D. G. Introdução à mineração de dados: conceitos básicos, algoritmos e aplicações. São Paulo: Saraiva, 2016.
COSTA, C. N. et al. Descoberta de Conhecimento em Bases de Dados. Revista Eletrônica: Faculdade Santos Dumont, 2 ed., s.d. Disponível em: <http://fsd.edu.br/revistaeletronica/arquivos/2Edicao/artigo9.pdf>. Acesso em: 26 jan. 2016.
FAYYAD, U.; PIATETSKY-SHAPIRO, G. SMITH, P. From datamining to knowledge discovery in databases. AI Magazine, v. 17, n. 3, p. 37-54, 1996.
FRAWLEY, W. J.; PIATETSKY-SHAPIRO, G.; MATHEUS, C. J. Knowledge discovery in databases: an overview. AI Magazine, v. 13, n. 3, p. 57-70, 1992. Disponível em: <http://www.aaai.org/ojs/index.php/aimagazine/article/viewFile/1011/929>. Aceso em: 21 jan. 2016.
GOLDSCHIMIDT, R.; PASSOS, E. Data mining: um guia prático. Rio de Janeiro: Elsevier, 2005.
HEUSER, C. A. Projeto de banco de dados. 4 ed. [Porto Alegre]: Sagra Luzzatto, 1998. Disponível em: <http://www.julianoribeiro.com.br/troca/banco_de_dados/material_der.pdf>. Acesso em: 05 fev. 2016.
JAVA, A. et al. Why we twitter: understanding microblogging usage and communities. In: WORKSHOP ON WEB MINING AND SOCIAL NETWORK ANALYSIS, 9., 2007, Estados Unidos. Proceedings of ... Estados Unidos: San Jose, 2007.
LIU, Bing. Web data mining: exploring hyperlinks, contents, and usage data. Springer Science & Business Media, 2007.
_________. Sentiment analysis and subjectivity. Handbook of Natural Language Processing, v. 2, p. 627-666, 2010.
_________. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, v. 5, n. 1, p. 1-167, 2012.
MUGNAINI, R.; STREHL, L. Recuperação e impacto da produção científica na era Google: uma análise comparativa entre o Google Acadêmico e a Web of Science. Encontros Bibli, Florianópolis, n. esp., 1º sem. 2008.
OLIVEIRA, J. P. M. de et al. Applying Text Mining on electronic messages for Competitive Inteligence. In: INTERNATIONAL CONFERENCE ON ELECTRONIC COMMERCE AND WEB TECHNOLOGIES, 5., 2004, Spain. Proceedings ... Spain: Zaragoza, 2004. Disponível em: <https://www.researchgate.net/profile/Leandro_Wives/publication/221017413_Applying_Text_Mining_on_Electronic_Messages_for_Competitive_Intelligence/links/09e41510bbc3323c41000000.pdf>. Acesso em: 27 jan. 2016.
RAMAKRISHNAN, R; GEHRKE, J. Database management systems. s.l.: s.n., [2000]. Disponível em: <http://dspace.utamu.ac.ug:8080/xmlui/bitstream/handle/123456789/85/%5BRamakrishnan_R.,_Gehrke_J.%5D_Database_Management_S(BookFi.org).pdf>. Acesso em: 15 fev. 2016.
SILBERSCHATZ, A.; KORTH, H. F.; SUDARSHAN, S. Sistema de banco de dados. Rio de Janeiro: Elsevier: 2006. (tradução de Daniel Vieira)
SUMITHRA, R.; PAUL, S. Using distributed apriori association rule and classical apriori mining algorithms for grid based knowledge discovery. In: SECOND INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, AND NETWORKING TECHNOLOGIES, 2010, India. Proceedings of... Índia, 2010. Disponível em: <http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5591577>. Acesso em: 03 ago. 2016.
TAN, A-H. Text Mining: the state of the art and the challenges. In: WORKSHOP ON KNOWLEDGE DISCOVERY FROM ADVANCED DATABASES, 1999. Proceedings of ... 1999. Disponível em: <http://www.ntu.edu.sg/home/asahtan/papers/tm_pakdd99.pdf>. Acesso em: 21 jan. 2016.
THOMÉ, A. C. G. Redes neurais: uma ferramenta para KDD e Data Mining. s.l.: [Universidade Federal do Rio de Janeiro], s.d. (Apostila). Disponível em: <http://equipe.nce.ufrj.br/thome/grad/nn/mat_didatico/apostila_kdd_mbi.pdf>. Acesso em: 26 jan. 2016.
VIJAYARANI, S.; MUTHULAKSHMI, M. Comparative analysis of Bayes and Lazy classification algorithms. International Journal of Advanced Research in Computer and Communication Engineering, v. 2, n. 8, ago. 2013.
YONG-HAK, J. Web of Science. Thomson Reuters, 2013.
ZHENG, Z.; KOHAVI, R.; MASON, L. Real world performance of association rule algorithms. In: INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2010, New York. Proceedings of … New York: ACM, 2001. Disponível em: <http://robotics.stanford.edu/users/ronnyk.link/realWorldAssocLongPaper.pdf> Acesso em: 03 ago. 2016.
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Marcelo Batista de Carvalho, Denise Fukumi Tsunoda
This work is licensed under a Creative Commons Attribution 4.0 International License.
The author must guarantee that:
- there is full consensus among all the coauthors in approving the final version of the document and its submission for publication.
- the work is original, and when the work and/or words from other people were used, they were properly acknowledged.
Plagiarism in all of its forms constitutes an unethical publication behavior and is unacceptable. Encontros Bibli has the right to use software or any other method of plagiarism detection.
All manuscripts submitted to Encontros Bibli go through plagiarism and self-plagiarism identification. Plagiarism identified during the evaluation process will result in the filing of the submission. In case plagiarism is identified in a manuscript published in the journal, the Editor-in-Chief will conduct a preliminary investigation and, if necessary, will make a retraction.
This journal, following the recommendations of the Open Source movement, provides full open access to its content. By doing this, the authors keep all of their rights allowing Encontros Bibli to publish and make its articles available to the whole community.
Encontros Bibli content is licensed under a Creative Commons Attribution 4.0 International License.
Any user has the right to:
- Share - copy, download, print or redistribute the material in any medium or format.
- Adapt - remix, transform and build upon the material for any purpose, even commercially.
According to the following terms:
- Attribution - You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions - You may not apply legal terms or technological measures that legally restrict others from doing anything that the license permits.