Revisiting corpus creation and analysis tools for translation tasks

Claudio Fantinuoli


Many translation scholars have proposed the use of corpora to allow professional translators to produce high quality texts which read like originals. Yet, the diffusion of this methodology has been modest, one reason being the fact that software for corpora analyses have been developed with the linguist in mind, which means that they are generally complex and cumbersome, offering many advanced features, but lacking the level of usability and the specific features that meet translators’ needs. To overcome this shortcoming, we have developed TranslatorBank, a free corpus creation and analysis tool designed for translation tasks. TranslatorBank supports the creation of specialized monolingual corpora from the web; it includes a concordancer with a query system similar to a search engine; it uses basic statistical measures to indicate the reliability of results; it accesses the original documents directly for more contextual information; it includes a statistical and linguistic terminology extraction utility to extract the relevant terminology of the domain and the typical collocations of a given term. Designed to be easy and intuitive to use, the tool may help translation students as well as professionals to increase their translation quality by adhering to the specific linguistic variety of the target text corpus.


Corpus Tools; Translation; Professionalization; Monolingual Corpus

Texto completo:

PDF/A (English)


Ahmad, K. & Rogers, M. “Terminology management: a corpus-based approach.” In: Proceedings of Translating and the Computer 14: Quality Standards and the Implementation of Technology in Translation. London, 1992. 33–44.

Ananiadou, S. “A methodology for automatic term recognition.” In: Proceedings of the 15th conference on Computational linguistics-Volume 2. Association for Computational Linguistics, 1994. 1034–1038.

Anthony, L. “A critical look at software tools in corpus linguistics.” Linguistic Research 30.2 (2013): 141–161.

Aston, G. “Corpus use and learning to translate.” Textus 12 (1999): 289–314. Available at:

Aston, G. “Foreword.” In: A. Beeby, P. Rodríguez-Inés, & P. Sánchez-Gijón (eds.). Corpus use and translating: Corpus use for learning to translate and learning corpus use to translate. Amsterdam: John Benjamins, 2009. IX–X.

Baker, M. “Corpus-based Translation Studies: The Challenges that Lie Ahead.” In: H. Somers (ed.). LSP and Translation: Studies in Language Engineering in Honour of Juan C. Sager. Amsterdam: John Benjamins, 1996. 175–186.

Baroni, M. & Bernardini, S. “BootCaT: Bootstrapping corpora and terms from the web.” In: Proceedings of LREC 2004, 2004.

Baroni, M., Bernardini, S. & Evert, S. “A WaCky Introduction.” In: M. Baroni & S. Bernardini (eds.). Wacky! Working papers on the Web as Corpus. Bologna: GEDIT, 2006. 9–40.

Beeby, A., Rodríguez-Inés, P. & Sánchez-Gijón, P. Corpus Use and Translating: Corpus use for learning to translate and learning corpus use to translate. Amsterdam: John Benjamins, 2009.

Bendazzoli, C. & Sandrelli, A. “Corpus-based interpreting studies: Early work and future prospects.” Tradumatica 7 (2009). Available at:

Bernardini, S. “Corpora for translator education and translation practice: Achievements and challenges.” In: Third International Workshop on Language Resources for Translation Work, Research & Training, 2006. 17–22.

Bernardini, S. & Castagnoli, S., “Corpora for translator education and translation practice.” In: E. Y. Rodrigo (ed.). Topics in language resources for translation and localization. Amsterdam: John Benjamins, 2008. 39–55.

Bowker, L. “Corpus resources for translators: academic luxury or professional necessity.” Tradterm 10 (2004): 213–247.

Bowker, L. “Using specialized monolingual native-language corpora as a translation resource: a pilot study.” Meta: Translators’ Journal, 43.4 (1998): 631–651.

Braun, S. “Integrating corpus work into secondary education: From data-driven learning to needs-driven corpora.” ReCALL 19.03 (2007): 307–328.

Castagnoli, S. “Using the Web as a Source of LSP Corpora in the Terminology Classroom.” In: M. Baroni & S. Bernardini (eds.). Wacky! Working papers on the Web as Corpus. Bologna: GEDIT, 2006. 159–172.

Dagan, I. & Church, K. “Termight: Identifying and translating technical terminology.” In: Proceedings of the fourth conference on Applied natural language processing. Association for Computational Linguistics, 1994. 34–40.

Daille, B. “Study and implementation of combined techniques for automatic extraction of terminology.” In: Workshop On The Balancing Act: Combining Symbolic And Statistical Approaches To Language, 1996. 49–66.

Evert, S. The statistics of word cooccurrences: word pairs and collocations, Doctoral Dissertation, University of Stuttgart, 2005.

Fantinuoli, C. InterpretBank: design and implementation of a terminology and knowledge management software for conference interpreters, Germersheimer Dissertations, 2012,

Fantinuoli, C. “Projekte und Projektionen in der translatorischen Kompetenzentwicklung.” In: S. Hansen-Schirra & D. Kiraly (eds.). Einbindung von Korpora im Übersetzungsunterricht als Schlüssel zur Professionalisierung. Frankfurt: Peter Lang, 2013. 173–188.

Fantinuoli, C. “Specialized corpora from the Web and term extraction for simultaneous interpreters.” In: M. Baroni & S. Bernardini (eds.). Wacky! Working papers on the Web as Corpus. Bologna: GEDIT, 2006. 173–190.

Ferraresi A., Bernardini S., Picci G. & Baroni M. “Web corpora for bilingual lexicography: a pilot study of English/French collocation extraction and translation.” In: R. Xiao (ed.). Using Corpora in Contrastive and Translation Studies. Newcastle: Cambridge Scholars Publishing, 2010.

Fletcher, W.H. “Making the web more useful as a source for linguistic corpora.” Language and Computers 52.1 (2004): 191–205.

Frankenberg-Garcia, A. “Raising teachers’ awareness of corpora.” Language Teaching 45.04 (2012): 475–489.

Gallego-Hernández, D. “The use of corpora as translation resources: A study based on a survey of Spanish professional translators.” Perspectives 23.3 (2015): 375–391.

Gavioli, L. & Aston, G. “Enriching reality: language corpora in language pedagogy.” ELT Journal 55.3 (2001): 238–246.

Gavioli, L. & Zanettin, F. “Comparable corpora and translation: a pedagogic perspective.” Paper presented at Corpus Use and Learning to Translate (CULT). Bertinoro, 1997.

Gellerstam, M. “Translations as a source for cross-linguistic studies.” Lund Studies in English 88 (1996): 53–62.

Gorjanc, V. “Terminology resources and terminological data management for medical interpreters.” In: D. Andres & S. Pöllabauer (eds.). Spürst Du, wie der Bauch rauf-runter? Fachdolmetschen im Gesundheitsbereich. Frankfurt: Peter Lang, 2006. 85–95.

Hansen, S. Nature of translated text: an interdisciplinary methodology for the investigation of the specific properties of translations, German Research Center for Artificial Intelligence, Saarland University, 2003.

Hansen-Schirra, S., Neumann, S. & Steiner, E. Cross-linguistic Corpora for the Study of Translations: Insights from the Language Pair English-German. Berlin: de Gruyter, 2012.

Hansen-Schirra, S. & Teich, E. “Corpora in human translation.” In: Corpus Linguistics. An International Handbook, Vol. 1. Berlin: de Gruyter, 2002. 1159–1175.

Jaaskelainen, R. & Mauranen, A. “Translators at work: a case study of electronic tools used by translators in industry.” In: G. Barnbrook, P. Danielsson, M. Mahlber., (eds.). Meaningful texts: the extraction of semantic information from monolingual and multilingual corpora. London: Continuum, 2005. 48–53.

Johns, T. “Whence and whither classroom concordancing.” Computer applications in language learning (1988): 9–27.

Khurshid, A., Gillman, L. & Tostevin, L. “Weirdness Indexing for Logical Document Extrapolation and Retrieval.” In: Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000.

Koehn, P. “Europarl: A Parallel Corpus for Statistical Machine Translation.” MT Summit 2005, 2005.

Kübler, N. “New Trends in Corpora and Language Learning.” In: A. Frankenberg-Garcia, L. Flowerdew, & G. Aston (eds.). New Trends in Corpora and Language Learning. London: Continuum, 2011. 62–80.

Mauranen, A. & Kujamäki, P. Translation universals: do they exist? Amsterdam: John Benjamins, 2004.

MeLLANGE. “Corpora and e-learning questionnaire. Results summary.” 2006. Available at:

Picton, A. et al. “Defining the Notion of “Corpora” in Translation: Addressing the Gap between Scholars’ and Translators’ Points of View.” Paper presented at conference Corpus use and learning to translate (CULT) Alicante, 2015.

Pöchhacker, F. Introducing interpreting studies, London: Routledge, 2009.

Scott, J. “Towards professional uptake of DIY electronic corpora in legal genres.” In: M. Sánchez (ed.). Salford working papers in translation and interpreting, 2012. Available at:

Shlesinger, M. “Corpus-based interpreting studies as an offshoot of corpus-based translation studies.” Meta: Translators’ Journal, 43.4 (1998): 486–493.

Sinclair, J. Corpus, Concordance, Collocation, Oxford University Press, 1991.

Teich, E. Cross-linguistic Variation in System and Text. A Methodology for the Investigation of Translations and Comparable Texts. New York: de Gruyter, 2003.

Tognini-Bonelli, E. Corpus Linguistics at Work. Amsterdam: John Benjamins, 2001.

Varantola, K. “Translators and Disposable Corpora, In: F. Zanettin, S. Bernardini & D. Stewart (eds.). Corpora in Translator Education. Manchester: St. Jerome, 2003. 55-70.

Zanettin, F. Corpora in translation practice.” In: Proceedings of the First International Workshop on Language Resources (LR) for Translation Work and Research, 2002a. 10–14.

Zanettin, F. “DIY corpora: the WWW and the translator.” In: B. Maia, J. Haller, & M. Ulrych (eds.). Training the language services provider for the new millennium. Universidade do Porto, 2002b. 239–248.

Zanettin, F. Translation-Driven Corpora, Manchester: St. Jerome, 2012.

Zanettin, F., Bernardini, S. & Stewart, D. Corpora in translator education, St. Jerome, 2003.


Cadernos de Tradução, ISSN 2175-7968, Florianópolis, Brasil.





Departamento de Língua e Literatura Estrangeiras (DLLE)- UFSC



Licença Creative Commons
Os trabalhos aqui apresentados utilizam a licença Creative Commons CC BY: