Beautiful, modest and housewife: what literary text mining tell us about male and female characterization


  • Cláudia Freitas PUC-Rio
  • Flávia ´Martins



Digital Humanities, Corpus Linguistics, Text Mining, Gender Studies


This paper presents the results of a research that combines quantitative and qualitative methods on discursive representations that focus on gender. Our goal is to identify how male and female characters are characterized in literary texts, and for that we explore a corpus of Brazilian literature with 5 million words, annotated with semantic and morphosyntatic information. The study is conducted along two directions: observing the predicates present in characters’ description and the actions performed by them. As a result, (i) we indicate how the methodology used allows different perspectives on the data, going beyond the analysis based on forms and frequency lists, and (ii) we ratify the stereotyped construction of male and female in the literature of the 19th and 20th centuries, with the female being characterized mainly by appearance, especially beauty, and by the domestic sphere.


ARSTEIN, R. Inter-Annotator Agreement. In: IDE, N.; PUSTEJOVSKY, J. (ed.).Handbook of Linguistic Annotation. Dordrecht: Springer, p. 297-313, 2017.

BAKER, P. Sociolinguistics and Corpus Linguistics. Edinburgh: Edinburgh University Press, 2010.

BAKER, P.; GABRIELATOS, C.; KHOSRAVINIK, M. et al. A Useful Methodological Synergy? Combining Critical Discourse Analysis and Corpus Linguistics to Examine Discourses of Refugees and Asylum Seekers in the UK Press. Discourse & Society, v. 19, n. 3, 273-306, 2008.

BARDIN, L. Análise de conteúdo Lisboa: Edições 70, 1977.

BICK, E. PALAVRAS, a Constraint Grammar-based Parsing System for Portuguese. In: SARDINHA, T.B. ; FERREIRA, T. (ed.). Working with Portuguese Corpora. London/New York: Bloomsburry Academic, 2014. p. 279-302.

BUTLER, J. Problemas de gênero: feminismo e subversão de identidade. 17. ed. Rio de Janeiro, Civilização Brasileira, 2019.

CAMERON, D. More Heat Than Light? Sex-difference Science & the Study of Language (Garnett Sedgewick Memorial Lecture) Vancouver: Ronsdale Press, 2012.

CAMERON, D.; PANOVI?, I. Working with Written Discourse. London: Sage, 2014.

COSTA, B.; FREITAS, C. Um léxico de verbos do dizer para tradutores – e considerações sobre a classificação dos verbos de elocução. Calidoscópio, [S. l.], v. 17, n. 3, p. 494-512, 2019.

FREITAS, C.; MARTINS, F.; BIAR, L. Um ‘olhar discursivo’ sobre predicação e gênero: aproximações metodológicas entre corpus e discurso. Texto Livre, Belo Horizonte - MG, v. 15, p. e36213, 2022.

FREITAS, C.; SOUZA, E. de. Sujeito Oculto às claras: uma abordagem descritivo computacional. Revista de estudos da linguagem, [S.l.], v. 29, n. 2, p. 1033-1058, mar. 2021.

HEARST, M. A. Automatic acquisition of hyponyms from large text corpora. In: INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, 14, 1992, Nantes. Proceedings […]. Nantes, 1992.

MAUTNER, G. Corpora and critical discourse analysis. In: BAKER, P. (ed.). Contemporary Corpus Linguistics. London: Continuum, 2009.

McENERY, T.; HARDIE, A. Corpus Linguistics: Method, theory and practice. Cambridge: Cambridge University Press, 2012.

McENERY, T.; WILSON, A. Corpus Linguistics: An Introduction. Edinburgh University Press, 2001.

MORETTI, F. Conjectures on world literature. New Left review. Vol. 1, 54-68, jan.-feb 2000.

MORETTI, F. A Literatura vista de longe. Trad. Anselmo Pessoa Neto. Porto Alegre: Arquipélago, 2008 [2005].

MULVEY, L. Prazer Visual e Cinema Narrativo. In: XAVIER, I (org.). A experiência do cinema. Rio de Janeiro: Edições Graal, 1973. p. 437-454.

SAMPSON, G. Empirical Linguistics. London: Continuum, 2001.

SANTOS, D. Literature studies in Literateca: between digital humanities and corpus linguistics. In: DOERR, M.; EIDE, Ø; GRØNVIK, O; KJELSVIK, B. (ed.). Humanists and the digital toolbox: In honour of Christian-Emil Smith Ore. Oslo: Novus Forlag, 2019. p. 89-109.

SANTOS, D. Corporizando algumas questões. In: TAGNIN, S. E. O.; VALE, O. A. (ed.). Avanços da Lingüística de Corpus no Brasil. São Paulo: Editora Humanitas/FFLCH/USP, 2008, p. 41-66.

SANTOS, D.; BICK, E. Providing Internet access to Portuguese corpora: the AC/DC project. In: GAVRILIDOU, M.; CARAYANNIS, G.; MARKANTONATOU, S.; PIPERIDIS, S.; STAINHAUER, G. (ed.). Proceedings of the Second International Conference on Language Resources and Evaluation (LREC 2000). Atenas, Grécia, 31 de Maio a 2 de Junho de 2000. p. 205-210.

SANTOS, D.; MARQUES, R.; FREITAS, C.; SIMÕES, A.; MOTA, C. Comparando anotações linguísticas na Gramateca: filosofia, ferramentas e exemplos. Domínios de Lingu@gem, [S. l.], v. 9, n. 2, p. 11-26, 2015. Disponível em: Acesso em: 25 mar. 2022.

SANTOS, D.; FREITAS, C.; BICK, E. OBras: a fully annotated and partially human-revised corpus of Brazilian literary works in the public domain. OpenCor, Canela, RGS, Brasil, 24 de setembro de 2018.

SILVA, F. M. Representações de gênero na caracterização de personagens: uma proposta metodológica e primeiros resultados. Rio de Janeiro, 2021. 169 p. Dissertação (Mestrado em Letras) – Departamento de Letras, Pontifícia Universidade Católica do Rio de Janeiro, 2021.

SMITH, S. CHOUEITI, M.; PIEPER, K. Gender Bias Without Borders: An Investigation of Female Characters in Popular Films Across 11 Countries. The Geena Davis Institute on Gender and Media and the Social Change Initiative at USC Annenberg, 2014. Disponível em: Acesso em: 16 set. 2019.

SONI, S.; KLEIN, L.; EISENSTEIN, J. Abolitionist Networks: Modeling Language Change in Nineteenth-Century Activist Newspapers. Journal of Cultural Analytics (I), p. 1-43, 2021.

UNDERWOOD, T.; BAMMAN, D.; LEE, S. The transformation of gender in English-language fiction. Journal of Cultural Analytics, v. 3, n. 2, 2018.DOI:

UNDERWOOD, T. Distant Reading and Recent Intellectual History. In: GOLD, M.; KLEIN, L. (ed.). Debates in the Digital Humanities 2016. University of Minnesota Press, Minneapolis, 2016.




