Computational identification of interlevel patterns in Brazilian Literature texts
DOI:
https://doi.org/10.5007/1807-9288.2025.e106921Keywords:
Digital Humanities, Textual patterns, Computational textual analysisAbstract
In literary texts, in poetry or prose, there is an intense presence of poetic devices and recurring linguistic resources at different linguistic levels. The identification of these linguistic resources with the help of computational tools can point out, through quantifiable analyses, patterns of relationships between these levels. The objective of this paper is to propose a computational method that allows the identification and correlation of textual patterns between linguistic levels in Brazilian literature texts. To this end, textual characteristics were extracted at different linguistic levels based on the quantification of occurrences through absolute and relative frequencies, both for the full text and for text excerpts, followed by correlation analysis of these quantified characteristics to identify interlevel patterns between them. The results obtained to demonstrate this computational method were extracted from the Brazilian literary works Os Sertões by Euclides da Cunha. These results contribute to the understanding of the various facets of the method, highlighting its ability to identify and correlate patterns at multiple linguistic levels, while demonstrating the variability of possible results, allowing a quantitative analysis of the patterns present. This research has the potential to open paths for studies in textual analysis, introducing a quantitative approach into a predominantly qualitative field.
References
ABAURRE, M. B. M.; PONTARA, M. Gramática: Texto: análise e construção de sentido. 2. ed. São Paulo: Moderna, 2011.
BLEI, David M.; NG, Andrew Y.; JORDAN, Michael I. Latent dirichlet allocation. Journal of machine Learning research, [S.I.] v. 3, n. Jan, p. 993-1022, 2003.
BUSA, Roberto A. Foreword: Perspectives on the digital humanities. In: SCHREIBMAN, Susan; SIEMENS, Raymond George; UNSWORTH, John (Orgs.). A companion to digital humanities. Malden, MA: Blackwell, 2004. p. xvi–xxi.
CARVALHO, R.; LOULA, A. C.; QUEIROZ, J. Identificação computacional de estruturas métricas de versificação na prosa de Euclides da Cunha. Revista de Estudos da Linguagem, Belo Horizonte, v. 28, n. 1, p. 41, jan. 2020.
CORSO, G.; FOSSA, C. R.; OLIVEIRA, G. B. de. Uma aplicação da teoria de redes à estilometria: comparando Machado de Assis e Tribuna do Norte. Revista Brasileira de Ensino de Física, v. 27, p. 389-393, 2005.
DE ROC BORONAT, C.; WANNER, L.. On the relevance of syntactic and discourse features for author profiling and identification. In: Conference of the European Chapter of the Association for Computational Linguistics, 15., p.681–687, 2017.
DELL’ORLETTA, F.; MONTEMAGNI, S.; VENTURI, G. Linguistic profiling of texts across textual genres and readability levels. an exploratory study on italian fictional prose. In: Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013, p. 189–197.
EDER, M.; PIASECKI, M.; WALKOWIAK, T. An open stylometric system based on multilevel text analysis. Cognitive Studies| Études cognitives, Warsaw, n. 17, 2017.
FERREIRA, J. J.; OLIVEIRA, H. G; RODRIGUES, R. J. Improving NLTK for Processing Portuguese. Symposium on Languages, Applications and Technologies, p. 9, 1 jan. 2019.
GALINA, R.; FLORES, D.; KOMATI, K.. Comparação de Atributos Estilométricos para Identificação de Autoria de Escrita: Um Estudo de Caso de Guimarães Rosa versus Clarice Lispector. In: ENCONTRO NACIONAL DE INTELIGÊNCIA ARTIFICIAL E COMPUTACIONAL (ENIAC), 16. , 2019, Salvador. Anais [...]. Porto Alegre: Sociedade Brasileira de Computação, 2019 . p. 353-364. ISSN 2763-9061. DOI: https://doi.org/10.5753/eniac.2019.9297.
GOODRICH, R. A. On Poetic Function: Jakobson's Revised 'Prague' Thesis. Literature & Aesthetics, v. 7, 1997.
JACOBS, A. M. Sentiment analysis for words and fiction characters from the perspective of computational (neuro-) poetics. Frontiers in Robotics and AI, [S.I.], v. 6, art. 53, 2019.
JAKOBSON, R.; POMORSKA, K. Diálogos. Tradução: Elisa A. Kossovitch. São Paulo: Cultrix, 1985.
JOCKERS, Matthew L. Macroanalysis: Digital methods and literary history. University of Illinois Press, 2013.
LAGUTINA, K.; LAGUTINA, N.; BOYCHUK, E.; PARAMONOV, I. The influence of different stylometric features on the classification of prose by centuries. In: Conference of Open Innovations Association (FRUCT), 27., 2020, p. 108–115. IEEE.
LARSON, R.; FARBER, B. Estatística Aplicada. Tradução: José Fernando Pereira Gonçalves. São Paulo: Pearson Education do Brasil, 2015.
LIMA, L.; LOULA, A. C.; QUEIROZ, J. Computational identification of phonological parallelisms in Brazilian literary prose. Second Workshop on Digital Humanities and Natural Language Processing (2nd DHandNLP 2022), p. 47–52, 2022.
MIN, S.; PARK, J. Modeling narrative structure and dynamics with networks, sentiment analysis, and topic modeling. PloS one, [S.I.], v.14, n. 12, p. e0226025, 2019.
MOHAMMAD, Saif M.; TURNEY, Peter D. Crowdsourcing a word–emotion association lexicon. Computational intelligence, v. 29, n. 3, p. 436-465, 2013.
MORETTI, Franco. Graphs, maps, trees: abstract models for a literary history. Verso, 2005.
PANG, Bo; LEE, Lillian. Opinion Mining and Sentiment Analysis. Information Retrieval, [S.I.] v. 2, n. 1-2, p. 1-135, 2008.
PENNEBAKER, James W.; BOOTH, Roger J.; FRANCIS, Martha E. Linguistic Inquiry and Word Count: LIWC2001. Mahway: Lawrence Erlbaum Associates, 2001.
ROMMEL, T. Literary studies. In: SCHREIBMAN, Susan; SIEMENS, Raymond; UNSWORTH, John (Orgs.) A Companion to Digital Humanities. Oxford: Blackwell, 2004, p.89.
SANTOS, D. Caminhos percorridos no mapa da portuguesificação: A Linguateca em perspectiva. Linguamática, v. 1, n. 1, p. 25-58, 6 abr. 2009.
SANTOS, D.; PIRES, E.; FREITAS, C.; FUÃO, R. S.; LOPES, J. M. Periodização automática: Estudos linguistíco-estatísticos de literatura lusófona. Linguamática, v. 12, n. 1, p. 81-95, 29 Jun. 2020.
SILVA, Mário J. et al. Automatic Expansion of a Social Judgment Lexicon for Sentiment Analysis.Technical Report. TR 10-08. University of Lisbon, Faculty of Sciences. 2010.
SOUZA, Marlo; VIEIRA, Renata. Sentiment analysis on twitter data for portuguese language. In: International Conference on Computational Processing of the Portuguese Language. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. p. 241-247.
UNDERWOOD, Ted. Distant horizons: digital evidence and literary change. Chicago: University of Chicago Press, 2019.
WAUMANS, M. C.; NICOD`EME, T.; BERSINI, H. Topology analysis of social networks extracted from literature. PloS one, [S.I.], v. 10, n. 6, p. e0126470, 2015.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Angelo Loula, Luciano Alves Machado Júnior

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who have their works published in Texto Digital agree that:
Copyrights remain with the authors, who grant the journal the right of first publishing their submitted manuscripts. All materials published by the journalare under an Attribution 4.0 International - Creative Commons License, which allows them to be shared since authorship and first publication credits are mentioned.
The Attribution 4.0 International - Creative Commons allows the copy and redistribution of the material in any medium or format, as well as its adaptation for any purpose, even commercially.
Authors can take additional contracts for non-exclusive distribution of the version of their works published by our journal separately (e.g. to publish it in an institutional repository or as a book chapter) with both expressed authorship acknowledgment and Texto Digital’s first publication credit.
